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Abstract 

C/3 ' The Survey Propagation (SP) algorithm for solving fc-SAT problems has been shown recently as an 

o ■ 

instance of the Belief Propagation (BP) algorithm. In this paper, we show that for general constraint- 
satisfaction problems, SP may not be reducible from BP. We also establish the conditions under which 
such a reduction is possible. Along our development, we present a unification of the existing SP 
algorithms in terms of a probabilistically interpretable iterative procedure — weighted Probabilistic 
Token Passing. 
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I. Introduction 

Survey Propagation (SP) O] is a recent algorithmic breakthrough in solving certain hard 
families of constraint satisfaction problems (CSPs). Derived from statistical physics, SP first 
demonstrated its power in solving classic prototypical NP-complete problems, the A;-SAT prob- 
lems II2I. — For random instances of these problems in the hard regime, SP is shown to be the 
first efficient solver [Ij. Recently, SP has also been applied to other CSPs, including other NP- 
complete problem families such as graph coloring (or g-COL) problems jSl, as well as problems 
arising in communications and data compressions, some examples being coding for Blackwell 
channels i4| and quantization of Bernoulli sequences [5]. In all these cases, great successes have 
been demonstrated. 

Powerful as it appears, SP however largely remains as a heuristic algorithm to date, where 
analytic understanding of its algorithmic nature and rigorous characterization of its performance 
are widely open and of great curiosity and research importance. 

Similar to the well-known Belief Propagation (BP) algorithm used in iterative decoding 
im and statistical inference [7], SP operates by iteratively passing "messages" in a factor 
graph representation |[8l of the problem instance, where each variable vertex corresponds to a 
variable whose value is to be decided and each function vertex corresponds to a local constraint 
imposed on the variables. This observation has inspired a recent research effort in understanding 
whether SP may be viewed as a special case of BP. — The significance of questions of such a 
kind has been witnessed repeatedly in the history of communication research, for example, in 
understanding the Viterbi algorithm as a dynamic programming algorithm |l9l, in understanding 
the turbo decoding algorithm [[TOll as an instance of Belief Propagation [fTT| . and in unifying the 
BCJR algorithm lfT2ll and the Viterbi algorithm under the umbrella of the generalized distributive 
law [[T3I . etc. These unified frameworks have on one hand provided additional insights into the 
nature of the algorithms, and on the other hand allowed an easier access of the algorithm by much 
wider research communities. Specific to the question "is SP BP", if SP may be understood as an 
instance of BP, then the existing analytic techniques of BP are readily applicable to analyzing SP; 
if SP can not be characterized as a special case of BP, one is then motivated to seek a different 
algorithmic framework to which SP belongs or to discover the unique algorithmic nature of SP. 

The first result reporting that SP is an instance of BP is the work of [[T4| in the context of k- 
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SAT problems. This result is generalized in [TF] to an extended version of SP for solving /c-SAT 
problems. Briefly, the authors of ifTSl present a Markov Random Field (MRF) lfT6ll formalism 
for A;-SAT problems; a parameter, denoted by 7 in this paper, is used to parametrize the MRF. 
When the BP algorithm is derived on such an MRF, the BP message-update equations result in 
a family of SP algorithms, referred to as weighted SP or SP(7) in this paper, parametrized by 
7 G [0, 1]; and when 7=1, SP(7) is the original (non- weighted) SP. In addition to extending SP 
— in the context of /c-SAT problems — to a family of SP algorithms with tunable performance, 
another significance of this result is a conclusive answer to the titular question in that context, 
namely that SP is BP for the k-SAI problem family. This result was re-developed in our earlier 
work [[TtI where a simpler MRF formalism using Forney graphs ifTSl is presented and a more 
transparent reduction of BP messages to weighted SP messages is given. 

The objective of this paper is to answer the question whether SP and more generally weighted 
SP are special cases of BP for arbitrary CSPs beyond fc-SAT problems. It is worth noting that 
weighted SP has only been presented for A;-SAT problems, although its principle may be extended 
to designing other CSPs involving binary variables (see, e.g., jSl). Furthermore, resulting from 
BP on a properly defined MRF, weighted SP, unlike the original (non-weighted) SP, does not 
have a probabilistic interpretation that does not rely on the MRF constructed in the style of ifTSl 
or ifTTl and the derived BP algorithm thereby. Thus to answer the question whether weighted 
SP is BP for general CSPs, it is necessary to formulate weighted SP for arbitrary CSPs that 
generalizes non-weighted SP without relying on any MRF and BP formalism. For this reason, 
this research and hence the structure of this paper roughly split into two parts. The first part 
answers the question what SP and weighted SP exactly are by presenting a probabilistically 
interpretable formulation of both non-weighted and weighted SP for arbitrary CSPs. The second 
part presents a MRF formalism for general CSPs in the style of [[TSl or ifTTl . derives the BP 
update equations, and answers the question whether and how BP under such MRF formalism 
may be reduced to SP, if at all. 

Although this paper focuses on the second part, namely, on answering whether SP algorithms 
are instances of BP on a properly defined MRF, our effort in establishing what SP algorithms 
are and how to formulate these algorithms for general CSPs is noteworthy. 

First, the notion of weighted SP, as noted earlier, has only been presented for k-SAI problems 
as in [fTSl and in sporadic example applications involving only binary variables such as in [j5l. 
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As will become clear in this paper, the design philosophy of weighted SP for CSPs involving 
binary variables (such as in ifTSl and ^) is not readily extendable to arbitrary CSPs with arbitrary 
variable alphabets, since an important notion underlying SP, namely, an appropriate extension 
of variable alphabets, is blurred in the binary special cases. 

Second, for non-weighted SP, we note that its formulation in the context of general CSPs 
primarily exists in the literature of statistical physics (see, e.g., llT9ll ). Although its design recipe 
has been laid out for arbitrary CSPs, its exposition in statistical physics language has made it 
rather difficult for readers with primarily engineering or computer science background. 

Thus, in addition to serving as the basis for the investigation of BP-to-SP reduction, the first 
part of the paper also aims at providing a clean, transparent and easily accessible formulation of 
SP algorithms in its most general form for arbitrary CSPs, without resorting to statistical physics 
concepts. 

II. Main Results and Paper Organization 
The main results of this paper are summarized as follows. 

In the first part, we formulate SP and weighted SP for general CSPs as what we call "prob- 
abilistic token passing" (PTP) and "weighted probabilistic token passing" (weighted PTP) re- 
spectively, where a message is a distribution (or non-negative function) on the set of "tokens" 
associated with a variable. Here a "token" is a non-empty subset of the variable's alphabet]. It 
has been previously observed in SP applied to various problems that a "joker" symbol is added 
to the original variable alphabet. Here we point out that extending the alphabet by simply adding 
a joker symbol is not sufficient for general CSPs, particularly for those involving non-binary 
variables. We stress that the right extension of the variable alphabet is to replace it with the set 
of all non-empty subsets of the original alphabet. Although an equivalent treatment has been 
described in some previous literature for non-weighted SP llT9l . this perspective is for the first 
time made explicit beyond statistical physics context and for both non-weighted and weighted 

'in fact more rigorously, a token is a non-empty subset of all possible assignments of a variable - In this paper, for more 
mathematical rigor and clarity, we make a distinction between the alphabet of a variable and the set of all assignments to the 
variable, where an assignment to variable is treated as a function mapping the singleton set {n} to the alphabet of Xv 
Nevertheless, one may always identify the set of all assignments to Xv with the alphabet of x^ via a one-to-one correspondence 
and loosely refer to the set of all assignments of a variable as the alphabet of the variable. 
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SP. Based on this notion of alphabet extension, we generalize weighted SP for arbitrary CSPs 
in the form of weighted PTP. In other words, the weighted PTP formulation presented in this 
paper serves as a recipe for designing weighted SP algorithm for arbitrary CSPs. 

In the second part, we present an MRF formalism — which we refer to as "normally realized 
MRF" — for arbitrary CSPs using Forney graphs, generalizing the MRF construction in the 
style of [ITSlI and [[TtI presented for A;-SAT problems. States, each consisting of a left state and 
a right state, are introduced in the MRF, where the left state corresponds to the token passed 
from the variable and the right state corresponds to the token passed from the constraint. For any 
given CSP, the MRF is parametrized by a collection of weighting functions, each corresponding 
to a variable in the CSP; in the A;-SAT special case, these weighting functions may reduce to a 
single parameter, 7. Noting the combinatorial importance of such MRF in the context of A;-SAT 
problems ifTSl . one expects that this general formulation of MRF for arbitrary CSP may serve a 
similar role, namely providing a combinatorial framework describing the topology of the solution 
space ifTSl . This direction, clearly deserving further investigation, is however out of the scope 
of this paper. 

On the normally realized MRF formalism, we then proceed to derive the BP update equations 
and investigate the reduction of BP to weighted PTP (noting that weighted PTP is weighted SP 
and that non-weighted SP is a special case of weighted SP). Primarily re-developing the results of 
lfT5l and ifTTIl on BP-to-SP reduction, we show that for /c-SAT problems, BP is readily reducible 
to weighted PTP as long as a condition — which we refer to as the state-decoupling condition 
— is imposed on the BP messages in initialization. An interesting fact about this condition in the 
context of A;-SAT problems is that as long as the condition is satisfied in the first BP iteration, 
it will continue to be satisfied in all iterations after. This forms the basis on which BP messages 
may be simplified to the form of weighted PTP messages. This condition, also arising in H15i 
and ifTTl as a peculiar and curious construction, had not been explained prior to this work. In this 
paper, we argue that the state-decoupling condition serves a critical role in the reduction of the 
weighted PTP messages from the BP messages derived from the MRF formalism in the style of 
ifTSl and ifTTl . or from the normally realized MRF presented in this paper. Using the example of 
3-COL problems, we show that such a condition is also needed in all BP iterations so as for BP 
to reduce to PTP. However, in that case, we show that this condition can not be made satisfied 
in every BP iteration (except for the trivial cases in which the BP messages contain no useful 
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information) and one must manually impose this condition by manipulating the BP messages 
in each iteration. This result on one hand justifies the important role of the state-decoupling 
condition in the reduction of BP to PTP and on the other hand asserts that BP is not PTP and 
hence not SP for 3-COL problems! 

At that point, one is ready to conclude that weighted PTP or weighted SP is not a special case 
of BP for general CSPs. The manual manipulation of BP messages in 3-COL problems, which 
results in what we call state-decoupled BP brings up a further question, namely, for general 
CSPs, whether PTP and weighted PTP are readily expressed as state-decoupled BP We proceed 
to show that for general CSPs, the reduction of weighted PTP from BP requires yet another 
condition pertaining to the structure of the CSP Briefly, this additional condition demands that 
the constraints in the CSP be "locally compatible" with each other in some sense. We show 
that the local compatibility condition of the CSP is the necessary and sufficient condition for 
state-decoupled BP to reduce to weighted PTP or weighted SP. At that end, we complete the 
answer to the titular question "is SP BP?". 

As mentioned earlier, in addition to answering whether SP is BP, another objective of this 
paper is to explain SP as simply as possible. For this purpose, we have made an effort in 
presenting this paper in a pedagogical manner and carrying along the examples of A;-SAT and 
3-COL problems throughout the paper. 

The remainder of this paper is organized as follows. In Section Hill we present a generic 
formulation of CSPs while also introducing various notations that will be used in later parts 
of the paper. In Section |IVl we introduce the existing SP algorithms using the examples of k- 
SAT problems and 3-COL problems, where we purposefully avoid SP formulations in statistical 
physics languages. We then proceed in Section |V] to present a general formulation of SP 
algorithms in terms of PTP and weighted PTP. In Section |VIl we present the normally realized 
MRF formalism and present results concerning the reduction of BP messages to SP messages. 
At this time, how SP algorithms behave over iterations and how they solve a CSP are important 
open problems. Although such questions are not of particular importance for the purpose of this 
paper, completely ignoring them appears not satisfactory to us and perhaps also to some readers. 
For this reason, we present some preliminary results along those lines for understanding the 
dynamics of PTP. — These results are included in the Appendix so as to maintain the focus of 
this paper. The paper is briefly concluded in Section IVIII 
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III. A Generic Formulation of Constraint Satisfaction Problems 

Let V he a finite set, in which each element will be referred to as a coordinate. Associated 
with each v , there is a finite alphabet Xv- For each f G V^, we will assume throughout of 
this paper that every Xv is identical to each other, and is therefore denoted by x- We note that 
this slight loss of generality is made only for lightening the upcoming notations, and that there 
is no difficulty to extend the results of this paper to more general cases where Xv^ are different 
from each other. For any subset U C V, a x-^i^^/gnmen? xu on t/ is a function mapping U 
into the set x- That is, a x-assignment xu specifies a way to assign each coordinate u E U a 
value in x- The set of all ^-assignments on U will be denoted by x'^ ■ When U is a singleton set 
{u}, which contains a single coordinate u, we will call x-assignment x^u} on {u} an elementary 
(X-)ctssignment and write it as Xu for simplicity. Clearly, any given elementary ^-assignment Xu 
is uniquely specified by a value r E x^ which is the assigned value in x to coordinate u. In 
this case, this assignment is denoted by r„, for example, if x '■= {0, 1}, then the only possible 
X-assignments on {u} are 0^ and !„, which are the elementary assignments assigning and 1 
to coordinate u, respectively. 

Suppose that U C W CV and that xw is a ^-assignment on W. We will use xw.u to denote 
the (function) restriction of xw on U. For any subset of x-assignments Q C x^ on W, we 
denote the projection of on U hy ^Ijj . That is, 

Q;U '■= {XW:U '■ xw € 

If coordinate set U can be partitioned into disjoint subsets A and B, then it is obvious 
that assignment xu decomposes into assignments xu-.a and xu-.b, and xu may be written as 
{xu:A,xu:b) (in any order). Evidently, xu may be decomposed according to any partition of U, 
not necessarily two-fold partitions. In particular, if a collection of sets {Ui : i E I}, for some 
X, form a partition of U, then we may assignment xu as {xu:Ui)iei- 

For simplicity, we will write {xa,xb) and {xui)iej in place of (xu-a, xu-.b) and {xu:Ui)iex 
respectively. In fact, unless some particular clarity is needed, we will always write xw.u simply 
as Xu, making the underlying xw implicit. Furthermore, when U is a singleton set {u}, as 
mentioned earlier, we will simply denote it by Xu, which reduces to the conventional "variable" 
notation standard literatures of graphical models. 
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Given x ^^id V , the objective of a constraint satisfaction problem (CSP) is to find a global 
X-assignment xy that satisfies a given set of constraints or to conclude that no such assignment 
exists. Formally, we will use set C to index the set of constraints {Fc : c G C}. Each constraint 
Fc, c G C, applies to a subset of the coordinates V , which will be denoted by V{c). Specifically, 
each constraint Fc is identified with a subset of x^^^\ and the constraint is satisfied by global 
X-assignment xy if xv:V{c) ^ ^c- Then any CSP may be formulated via specifying V, C, x, 
{V{c) : c G C} and {Fc : c G C}, where the objective of the CSP is to find a ^-assignment xy 
such that 

n[^v:y(c) eFc] = 1, (1) 

cec 

or to conclude that no such assignment exists. Here the notation [P], for any Boolean proposition 
P, is the Iverson's convention [[8]|, namely, evaluating to 1 if P, and to otherwise. 

Now it is easy to verify that the factorization structure of (dJ can be represented by a factor 
graph [8]: in the factor graph, "variable vertices" are indexed by V , where the "variable" indexed 
hy V eV represents an elementary assignment Xv:{v} on {y}, or simply x^; "function vertices" 
are indexed by C, where the function indexed by c G C is [xv:V{c) ^ Tc], which, with a slight 
overloading of notation, will also be denoted by Tc{xv{c))\ there is an edge connecting variable 
vertex x^ with function vertex Fc if and only if i; G K(c). Inspired by its correspondence (to an 
edge) in the factor graph, we will use {v — c) to denote a coordinate-constraint pair (f , c) where 
coordinate v is involved in constraint Fc in the CSP. 

For notational symmetry, we denote the set {c : v E V{c)} by C{v), namely, C{v) indexes 
the set of all constraints involving coordinate v, or the set of all function vertices connecting 
to variable vertex x^. We will assume that |C(t;)| > 2 for all v E V. It is clear that such an 
assumption is without loss of generality, since if a variable x^ is involved in only one constraint, 
one may always modify the constraint and remove the variable from the problem. Similarly, we 
will assume that \V{c)\ > 2 for every c E C. This is also without loss of generality since if a 
constraint Fc only involves a single variable Xy, it is always possible to "absorb" this constraint 
in other constraints involving x^, (noting that Xy must have another constraint since |C(i;)| > 2|). 

A. k-SAT 

The /c-SAT problems are a classic family of CSPs, known to be NP-complete for k > 3 f2\. 
An instance of /c-SAT problems consists of a set of variables {x^ : v E V}, each of which takes 
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on values from the set x '■= {0, 1}, and a set of constraints {r^ : c G C}, each of which involves 
exactly k variables. For each constraint Fc and every v E V{c), there is a value L^^c G {0, 1} 
which we will refer to as the preferred value on v in constraint Fc. The /c-SAT problem is 
then to decide on an assignment xy such that for each constraint Fc, at least one of its involved 
coordinate is assigned its preferred value in Fc. To map back to the afore-mentioned set-theoretic 
formulation of constraints, in a /c-SAT problem, for each c G C, let /"^ denote the ^-assignment 
on V{c) in which every coordinate v G V{c) is assigned the negated value L^, c of its preferred 
value Ly^c in Fc, namely that l':^,^^ = L„ c for every [v — c), then constraint Fc is defined as 
Fc := x^(^) \ 

The factor-graph representation of a toy 3-SAT problem is shown in Fig. d] For A;-SAT 
problems, it is convenient to treat each preferred value L^ c as the label for edge (x^,, Fc) on the 
factor graph, and use dashed edge to represent label and solid edge to represent label 1. 

We note that it is customary in this paper that variable vertices in a factor graph are listed on 
the left side and function (constraint) vertices listed on the right side. 




Fig. 1. A factor graph for 3-SAT problem specified by formula (a-i \/X2 Va;4) A {xi W x-j, Vxs) A {x2\/ x^M x^,). Logic operation 
notations are used here to define the problem, where V denotes logic OR, A denotes logic AND, and the horizontal bar on a 
variable denotes the negation of the variable. The function represented by the factor graph is [{x\,X2;X4) £ Ta] ■ [{xi, 2:3, xs) £ 
Ft] ■ [{x2,Xi, X5) e r,], where Ta = X^''''^^{(Oi, I2, U)}, = x^''"'"* \{(0i, O3, 15)}, and = x^''*''^{(02, O4, 05)}. 
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B. Graph Coloring 

Graph coloring or g-COL problems are another family of NP-complete problems. Given an 
undirected graph (A, S) with vertex set A and edge set S, the objective of the g-COL problem 
on (A, S) is to assign each vertex in A a color from q different colors such that every pair of 
adjacent vertices have different colors. To use the above generic formulation of CSPs, we will 
denote the set of all q colors by set % := {1, 2, . . . , g}. We will denote every undirected edge in 
S, say the edge connecting vertices u and v, by set {u,v}. The set V of all coordinates is then 
identified with set A, and the set C indexing all constraints is identified with E. Specifically 
note that every c G C is then identified with some {u,v} E S, and V{c) is identified with c, 
or the corresponding set {u,v}. Suppose that c = {u,v} E S, then constraint Fc is identified 
with x^'^'^^ \ {(!«, Id), (2«, 2„), . . . , g^)}. Fig. [2tb) shows the factor-graph representation of 
a g-COL problem on the undirected graph shown in Fig. [2ta). 




(b) 



Fig. 2. (a) An undirected graph, (b) The factor graph for a g-COL problem on graph (a). The global function represented 
by the factor graph is [{xi,X2) £ r{i^2}] • [{xuxs) € ^{i,a}] ■ [{x2,X3) G ^{2,3}] ■ [{x3,X4) G r{3_4}], where r{„_„} — 
X<"'''>\{(l„,l„),(2„,2„),...,(g„,g,)}. 

IV. Survey Propagation Algorithms 

A. Survey Propagation for k-SAT Problems 

Extensive study has been carried out to understand the hardness of fc-SAT problems (for 
A; > 3) and to develop efficient solvers. A parameter a := |C|/|y| is observed to be critically 
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related to the hardness of random /c-SAT problems. There appear two thresholds of a, denoted 
by ad and ac, («d < etc), marking two "phase transitions" [[D. When a > ac, random A;-SAT 
problems are unsatisfiable (i.e., having no satisfying assignment) with high probability; when 
ad < a < ac, the satisfying assignments form exponentially many disjoint "clusters", making 
the problem extremely difficult; when a < ad, the satisfying assignments merge into one huge 
cluster and problems are easier. In the regime of a < ad, local search algorithms, such as BP, 
may find a satisfying assignment. In the regime of < a < ac, local search algorithms usually 
fail. 

The discovery and first application of survey propagation (SP) are in solving the A;-SAT 
problems in the hard regime, where messages are passed on the above-defined factor graphs [HI. 
In SP, a "joker" symbol "*" is introduced to variable alphabet x of the A;-SAT problem, where 
Xv equal to the "joker" indicates that it is free to take any value from its original alphabet, and 
that Xi, equals a non-joker symbol indicates that it is constrained to taking the designated value. 
Briefly, SP on A;-SAT problems may be viewed as an iterative method for estimating the "biases" 
of each variable x-u on 0, 1 and * respectively and a variable that is highly biased on or 1 
can be fixed to that value whereby simplifying the problem. It is shown that in the hard regime 
of random A;-SAT problems, the "joker" symbol connects the disconnected clusters, making SP 



remain very effective even for a very close to ac 11151 . For A;-SAT problems, the original version 
of SP [1| is generalized in lUSl to what we call the weighted SP ^ or SP(7) in this paper. SP(7) 
is a family of algorithms parametrized by a real number 7 G [0, 1], where SP(1) is the original 
SP and for some judicious choice of 7 G (0, 1), SP(7) may have further improved performance. 

We note that generalizing SP to the family of weighted SP algorithms has only been reported 
for A;-SAT problems to date, and one of the objectives of this paper is to extend such a 
generalization to arbitrary CSPs. 

Similar to BP, in the SP algorithms, messages are passed between variable vertices and function 
vertices. For the purpose of describing the SP message-update rule for A;-SAT problems, we 
introduce the following notations. For any {v — c), Cl{v) denotes the set {h G C{v)\ {c} : 
Ly^b 7^ Ly^c}, and Cl{v) denotes the set {h e C{v)\ {c} : L^^b = L^^c}- 



^In 1151 , weighted SP is referred to as generalized SP. In this paper, we would like to reserve the term "generalized SP" to 
refer to SP algorithms generalized for arbitrary CSPs beyond fc-SAT problems. 
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Following IfTSl . the message-update rule of SP(7) is described as follows. 

The message passed from variable vertex to function vertex Vc — also referred as a 
left message — is a triplet of real numbers (njj_^^, n^,_^^, n*_^^), and the message passed from 
function vertex to variable vertex Xy — also referred to as a right message — is a real number 
?7c^„ G [0, 1]. These messages are updated respectively according to the following equations. 

n;;^, := [1-7 H (i-^^'-)) H (i-^^-) (2) 

nu (1- n (1-^^-) n i^-vb->v) (3) 
■■= n (1-^^-) n i^-vb->.) (4) 

r]c^, := Yl — . (5) 

uev{c)\{v} "-^^^ 

The initialization of SP messages is usually random, and message-passing schedule is typically 
similar to the flooding schedule [|8| in BP message passing, namely, that each iteration may 
be defined by all variable vertices passing messages followed by all function vertices passing 
messages. We note that throughout this paper all message-passing schedules are restricted to 
the flooding schedule for convenience, where each iteration is defined as first updating all "left 
messages" and then updating all "right messages" 

Similar to BP, at the end of an iteration, SP may compute a "summary message" at each 
variable vertex. For any v E V, define C^{v) := {b E C{v) : L^^b = 1} and C^(v) := {b E 
C{v) : Ly^b = 0}, then the "summary message" at x^ is a triplet (C^C^C) of real numbers, 
computed by 



^An iteration may also include updating all summary messages after updating the right messages; see the description of 
summary messages. 
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C ■■= 1-7 TT (1-^;.-.) TT a-vb^v) (6) 



c ^= 7 n (1-^^-) n (i-^f-) 

bec^(v) beco{v) 

where summary message (C^, C*) is typically normalized to a scaled version (^(^^'^°^'^^ ^onovm^ ^*norm^ 
such that 

^inorm ^ ^Qnorm ^ ^*norm _ ^ 

Equations ©to ([8]) and the normalization procedure after completely specify the message- 
update rule of SP(7). 

Usually, SP is applied in conjunction with a heuristic "decimation" procedure, which is carried 
out after SP converges or after a certain number of SP iterations. In the decimation procedure, 
the "polarity" B(v) := C°°°™ ~ C"^""^™ each v E V is calculated, and the most polarized 
variable (namely, one having the highest is fixed to or 1 according to the sign of B(v): 

Xv is set to if B{v) > 0, and to 1 otherwise. The A;-SAT problem is then simplified and SP 
is applied again. This process iterates until the reduced problem is simple enough for a local 
search algorithm. 

When 7 = 1, it is shown in [fT9l and [fT5l that the passed messages as in ^ through © can be 
interpreted probabilistically, namely, rjc^y may be interpreted as the probability that a "warning" 
symbol is sent from Tc to x^, and H!^^^, H^^^ and H*^^ are respectively the probabilities that 
Xy sends to Tc symbol L^, c, symbol Ly^c and symbol *. 

When 7 < 1, SP(7) however can no longer be interpreted probabilistically. We now present a 
slightly modified formulation of SP(7), referred to as SP*(7), which is completely equivalent to 
SP(7) defined in lUSl . and which will be shown in a later section to have a natural probabilistic 
interpretation. 

In SP*(7), the left message (H^^^, E^^^, E*^^) passed from variable vertex x^ to function 
vertex Tc is modified to the equations given in Q to ([TT]) . and the right message rjc^v passed 
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from function vertex to variable vertex Xy and the summary message {Q, C^, Q) at variable 
Xy Stay unchanged. 



n;;^, := (1-7 n (i-^f-) n (9) 



■■= 1 - 7 n (1 - "^^-^^ n (1 - ^''--) (10) 
7 n (i-^''-) n (1-^^--) (11) 

beciiv) bec-{v) 
The following lemma shows that SP(7) and SP*(7) are equivalent. 

Lemma 1: For the same initialization of {r/c^i. : V(t' — c)}, at any given iteration, SP*(7) and 
SP(7) give rise to identical results in rjc^^ for every {v — c), and in {Q, (*) for every v eV. 

Proof: The lemma follows from that in the computation of rjc^y and hence of (Cj, C)' 
n^_^^ and n*^^ always appear together in the form of H^^^ + H*^^. But it is easy to see that in 
SP(7) and in SP*(7), U-l^c + ^l^c same parametric form, both equal to n6ec"(i))(-'- ^ 

We conclude this subsection by remarking that it is possible to verify that all results concerning 
SP(7) in O hold for SP*(7) Q As such, in the rest of this paper, SP*(7) rather than SP(7) 
will be taken as the weighted SP for /c-SAT problems. 



B. Survey Propagation for q-COL Problems 

Similar to SP developed for k-SAT problems, in g-COL problems, SP passes messages between 
the variable vertices and the function (constraint) vertices in the factor-graph representation of 
the problem. Some notable differences however exist. 

First, weighted SP has not been developed for g-COL problems to date, and it is not even 
clear whether such algorithm family, if existing, can be developed in a similar manner as that 
for A;-SAT in ifTSl . namely, via reducing the BP algorithm derived from a properly defined MRF 
Answering this question in a later section, we here therefore only review the original version of 



''Specifically, we note that BP on the MRF formulated in 1151 will also reduce to SP*(7). We leave this for the interested 
readers to verify. 
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SP applied to 3-COL problems following the formulation in |l3l, which is analogous to SP(1), 
or the non-weighted SP, in the context of A;-SAT. 

Second, the SP messages for g-COL problems can be expressed more compactly, due to a 
specific nature of the problem, on which we now elaborate. 

For g-COL problems, each constraint vertex has degree 2. This allows the combination of the 
message passed from variable x« to a neighboring constraint, say Fc, with the message passed 
from constraint Fc to the other neighbor, say x.^, of F^. As a consequence, Fc may be suppressed 
in the factor graph, and messages are directly passed between variable vertices that are distance 2 
apart If (or equivalently, messages are passed on graph (A, S)). Following [|3l, a compact version 
of SP message-passing rule for 3-COL problems is given as follows, where the message passed 
from variable Xu to variable Xy is a quadruplet of real numbers ?7^_,^,, ?7^^„, ?7*_^^,). For 

i = 1,2,3, 

n (i-^t^j-E n {v*n,-.u+vi->u) + n ^;^n 

^ w£N{u)\{v} j^iw£N{u)\{v} weN(u)\{v} 



E n (i-vi^u)- E n iv*^-.u+vi-^u)+ n 

j=l,2,3w&N{u)\{v} j=l,2,3w(^N(u)\{v} w&N{u)\{v} 

(12) 



where N(u) is the set {v : v E V, {u, v} E S}, namely, the set of neighboring vertices of vertex 
u on graph {A, S}; and 

3=1,2,3 

For 3-COL problems, the "summary message" computed at each variable vertex x^ is a 
quadruplet of real numbers, denoted by (C^i, C Cv^ O' where for i = 1,2, 3, 

n (i-^;^j-E n {v:^.+vi^.)+ n v:^. 
n {i-vi^v)- E n iv:^. + vi-^v) + n v:^. 

j=l,2,3ueN{v) j=l,2,3ueN{v) u&N{v) 

and 

i=l,2,3 

Similar to that for A;-SAT problems, the summary message for a 3-COL problem at variable x^ 
may indicate the "bias" of variable x^ to each letter in {1, 2, 3, *}. In the decimation procedure 



'still implementing the flooding schedule, the SP message-update rule for 3-COL problems however suppresses the passing 
of one set of messages (say, for example, the right messages) by including the computation of these messages in updating the 
other set of messages. 
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for 3-COL problems - carried out in a similar way to that for A;-SAT problems, a variable is 
fixed to a color i E {1, 2, 3} if it is highly biased to that color. The reader is referred to |l3l for 
a detailed account of a heuristic decimation rule used in solving 3-COL problems using SP. 

We note that this paper primarily focuses on SP update equations, where the decimation aspect 
of SP is largely ignored. 

V. SP AS Probabilistic Token Passing 

To date, SP algorithms have been applied to various other CSPs, for example, in coding for 
Blackwell channels JH, in quantization of Bernoulli sources flSl, and in solving graph coloring 
problems dSJ, etc.. However, a general formulation of SP, particularly that of weighted SP, for 
solving arbitrary non-binary CSPs, has been largely missing. Specifically, we note the following 
milestones in the formulation of SP algorithms. 

• The work of |fT9l presents non-weighted version of SP formulas for general CSPs beyond 
those involving only binary variables. However, the exposition of ffT9l uses the language 
of statistical physics, rather remote to the engineering community, and a cleaner and more 
friendly formulation of SP, and particularly of weighted SP, is desirable for general problems. 

• The work of [[TSl presents weighted SP for A;-SAT problems, in which weighted SP is 
treated as a special case of BP in a properly defined MRF. This treatment of SP and 
the corresponding principle for developing weighted SP are conceivably applicable to all 
binary CSPs. However, it has remained open, prior to this work, whether such an approach 
to understanding and developing weighted SP is applicable to arbitrary non-binary CSPs. 

The line of development in this section is summarized below. 

We will first present an understanding of non-weighted SP for arbitrary CSPs (namely, that 
formulated in [fTOl ) in terms of "probabilistic token passing (PTP)". Although similar under- 
standing has been previously reported in various contexts, we here stress the role of extending 
the variable alphabet in SP algorithms, and explicitly point out that the alphabet extension is 
not to simply include an extra joker symbol, but to replace the variable alphabet with its power 
set (excluding the empty-set element). To make the PTP procedure more intuitively sensible, 
prior to defining PTP, we will introduce a precursor of PTP, which we call "deterministic token 
passing" (DTP). 
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After introducing PTP, we then show that the probabilistic interpretation of non-weighted SP in 
terms of PTP makes it naturally generalizable to a weighted version, which we call weighted PTP 
For a brief preview, the generalization of PTP to weighted PTP essentially involves generalizing 
afunctional dependency in PTP message-update rule to a probabilistic dependency. Interestingly 
as we will show, it turns out that for /c-SAT problems, weighted PTP precisely coincides with 
weighted SP of [[TSl . This should convincingly demonstrate that weighted PTP is a generalization 
of weighted SP for arbitrary CSPs. 

The outline of this section is given as follows. S ubsection IV- Al introduces the notion of alphabet 
extension and related concepts. S ubsection IV-B I defines DTP as a precursor of PTP. In Subsection 
IV-C[ we introduce PTP. In Subsection IV-D[ we show that PTP is equivalent to SP, using 3-COL 
problem as an example. In Subsection IV-E[ we introduce weighted PTP. In Subsection IV-F[ we 
show that weighted PTP generalize weighted SP using /c-SAT problems as an example. 

A. Alphabet Extension 

For a given CSP with variable alphabet we define the extended alphabet x* as the power 
set of X excluding the empty set 0. That is, x* = {t : t C t ^ 0}). The extended alphabet 
X* of A;-SAT problems is then the set {{0}, {1}, {0, 1}}. For 3-COL problems, x* is the set 
{{1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}. Each element t of x* will be written as a string 
- in bold font - containing the elements of t. For example, we may write {1, 2} as 12, {1, 2, 3} 
as 123 and {1} simply as 1. 

Given any subset U C V, a x*-assignment yu on U is referred to as a rectangle on U. The 
set of all rectangles on U is denoted by (x*)^ ■ Given rectangle yu E (x*)^^ for every v E U, 
yu:{v}, or simply written as ?/.„ — following an earlier convention of this paper — is referred 
to as the v-side of yu- Apparently, rectangle yu has \U\ sides, and may also be written as the 
concatenation of all its sides, namely, as {yv)veu- 

For any v E V, an elementary x*-assignment E (x*)^^^ will be referred to as a token on 
V. Using this nomenclature, the t>-side of any rectangle is a token on v. We note that a token 
tv may be interpreted as a set of elementary ^-assignments on {v}, which is in fact the set 
of all elementary ^-assignments on {v} that assign v a value in set ti,{v) C x- For example, 
suppose that x '■= {Ij 2, 3}, then token 12^, may be identified with the set {1^,, 2^,} of elementary 
X-assignments on {v}. 
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It is worth noting that when a token ty is identified with a set of elementary ^-assignments 
on V, a rectangle {ty)y(zu may be identified with the Cartesian product of all its sides. For 
example, rectangle (12^,, 23„) may be interpreted as the following set of ^-assignments on {v, u}: 
{{Iv, 2u), (It,, 3m), {2y, 2u), (2„, 3„)}. Under this interpretation, we will also make frequent uses 
of the Cartesian product notation, writing rectangle (12„,23u) as 12„ x 23^, and rectangle 
{tv)veu as rit^G!/^'^- ^^^^ ^^^^ '•^^^ interpretation is in fact the reason for which we choose 
the terminologies "rectangle" and "side". 

For simplicity, from here on, we shall reserve the term "assignment" to referring to a 
assignment only, and a x*-assignment will be referred to as a "rectangle", "side" or "token". 

We say that an assignment xu on U is contained in rectangle yu if G yu-.ivyiv) 

for every v E U. For example, assignment (1^, 2„) is contained in rectangle (ISy, 23^) We will 
use Xu € yu to denote this containedness relationship, since this notation is precise when the 
rectangle yu is interpreted as a set of assignments on U. 

Given a CSP and a {v — c) pair, we define function : (^■^*'^^^^'^\^'"^ (x*)^^^ as follows: 
for every rectangle Uuev{c)\{v} on V{c) \ {v}, 

kI n tu] ■■= iix^''^ ^ n ^")nrj . 

\«Gy(c)\M / \ \ u&v{c)\{v} J J 
We often write in short as Fc since the domain and co-domain of the function may be 
recovered from the form of its argument. Given rectangle nugy{c)\{i'} ^« ^('^) \ {^}' '-^^^ 
f c {lluev{c)\{v} the forced token by rectangle Yluev(c)\{v} ^« ^ia constraint T^. It is easy to 

verify that the forced token Fc Yl tu] is simply the set of all (elementary) assignments 

\uev{c)\{v} J 

on {v} which, when concatenated with an assignment on V{c) \ {v} contained in rectangle 

Yl tu, make local constraint satisfied. We now give some examples using the toy 3- 

uevic)\{v} 

SAT problem shown in Fig. [T] to illustrate this definition. Consider constraint F^, if rectangle 
t{i 2} on {1,2} is defined as (li,0l2), then forced token Fa(t{i 2}) = OI4, since when assigning 
variable X4 either value or 1, it is possible to find an assignment of variables xi and X2 in 
rectangle ^{1,2} that makes F^ satisfied; on the other hand, if t{i,2} = (Oi, I2), then forced token 
Fa(t|i 2}) = O4, since rectangle t{i,2} contains a single assignment of Xi and X2 (namely (Oi, I2)), 
and the only assignment of X4 that will make constraint F„ satisfied is the one assigning to 
X4, namely O4. 
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A "monotonicity property" of function Fc, stated in the following lemma, follows immediately 
from the definition of the function. 

Lemma 2: Suppose that x„ and Fc are a pair of neighboring variable and constraint vertices 
in the factor graph, and that yv{c)\{v} and yv{c)\{v} ^'"^ '^^^ rectangles on V{c) \ {v}. Then 
yv{c)\{v} C yvic)\{v} impliese that Fe {yv{c)\{v}) ^ F^ {yv[c)\{v}) ■ 

B. Deterministic Token Passing (DTP) 

As we will introduce — for arbitrary CSPs — a probabilistic interpretation of non- weighted SP 
(namely, FTP) and generalize it to a weighted version (namely, weighted FTP), in this subsection, 
we first introduce an algorithmic procedure, which we call deterministic token passing or DTF 
We note that the purpose of introducing DTF is to provide an easier access to FTF, a procedure 
to be introduced in the next subsection. 

In DTF, messages are tokens passed along the edges of the factor graph representing the 
CSF of interest. Specifically, the token passed from and to each variable is a token on v, or 
equivalently, a set of (elementary) assignments on {v}. For any pair of neighboring vertices x„ 
and Fc on the factor graph, the token, or left message, t^^c passed from variable Xy to constraint 
Fc depends on all incoming tokens (right messages) passed to x^, except that passed from Fc. 
Similarly, the token, or right message, tc^i. passed from constraint Fc to variable a;„ depends on 
all incoming tokens (left messages) passed to Fc except that passed from x^. Each iteration of 
token passing in DTF is defined by every variable passing a token on each of its edges followed 
by every constraint passing a token on each of its edges. Within any iteration, the token-passing 
rule of DTF is given as follows. 

tv^c ■■= n h^v (14) 

h€C(v)\{c} 

tc^v := Fc Yl ■ (15) 

\«ey(c)\{,;} / 

That is, the token passed from a variable is the intersection of its incoming tokens from the 
upstream, whereas the token passed from a constraint is the forced token via the constraint by 
the rectangle formed by the upstream incoming tokens as sides. 

It is intuitive to illuminate this message-passing rule using the following analogy. We may 
view the token sent from a variable as the "intention" of the variable, indicating the possible 
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values that the variable intends to take. On the other hand, we may view the token sent from a 
constraint as the "command" from the constraint, indicating the possible values that the constraint 
allows the destination variable to take. If a is an intention and 6 is a command, where both are 
tokens on the same coordinate, then the relationship a C b may be viewed as that "intention 
a obeys command 6". Under this perspective, the token sent from a variable is the "maximal" 
intention of the variable that obeys all incoming commands from the upstream constraints; on 
the other hand, the token sent from a constraint is the "maximal" command that is "compatible" 
with all incoming intentions from the upstream variables. Here "maximality" is in the sense of 
maximizing the cardinality of the subset of assignments, and "compatibility" is in the sense of 
satisfying the local constraint. 

Examples of token passing for a 3-COL problem are illustrated in Fig. [3l 




(a) 



(b) 











xt"^ 23„ 

















Fig. 3. Examples of deterministic token passing for a 3-COL problem, (a) Token tc^v passed from constraint Fc to variable 
Xv (b) Token t„^c passed from variable Xv to constraint Fc. 



A summary message or "summary token" at variable vertex may be computed, according 
to the rule in (fT6l) for each v E V at any iteration after the all constraint vertices have passed 
tokens. 

U:= Pi h^,. (16) 

Using the "intention/command" analogy, the summary token at a variable is the "maximal" 
intention of the variable that obeys the incoming commands from all directions. 

Some caution is needed on the well-definedness of the updating rule of passed tokens and 
summary tokens. That is, in (fT4l) . (fT5l) and (fT6l) the right-hand side can be equal to the empty 
set 0, which is not a well-defined token. Whenever in an iteration a not- well-defined token (i.e.. 
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the empty set) arises from the updating rule, we may force DTP to terminate. — As we will see 
later in the "random" version of DTP (i.e., PTP and weighted PTP), we will eventually condition 
on the case in which these events do not happen. 

At any iteration, one may read out the summary tokens at all variable vertices and form a 
rectangle on V using these tokens as its sides. It is clear that at any given iteration, the resulting 
rectangle formed by the summary tokens depends on the initialization of DTP. 

Although our primary purpose of introducing DTP is to make smoother the transition to 
understanding PTP, in Appendix A, we present some elementary results concerning the dynamics 
of DTP. We note that those results will also be used to derive some insights on the dynamics 
of PTP — an algorithmic procedure that we introduce next as a simple formulation of SP 

C. Probabilistic Token Passing (PTP) 

We now introduce the "probabilistic token passing" (or PTP) procedure. The key distinction 
between PTP and DTP is that on each edge and along each direction, PTP passes a random 
token and the messages being updated in PTP are the distributions of the random tokens. 

Specifically, PTP message-update rule can be constructed by considering the following mech- 
anism of passing random tokens. 

1) On each edge connecting variable and constraint in the factor graph, the token t^^^ 
passed to constraint Fc and the token t^.^^ passed to variable x.^, are both random variables, 
distributed over ix*)^"^ ■ 

2) For any given vertex in the factor graph, all of its incoming random tokens are assumed 
to be independent. 

3) For any given vertex in the factor graph, the outgoing random token sent along any edge 
is a function of all the incoming random tokens from the upstream, where the functional 
dependency is precisely that specified in DTP, namely, (fT4l) or (fT5l) . depending on whether 
the vertex is a variable vertex or a function (constraint) vertex. 

4) The summary (random) token at each variable vertex x^ is a function of all incoming 
random tokens, where the functional dependency is precisely that specified in DTP, namely, 
CH). 

Building on this mechanism, we will then define each PTP (passed or summary) message as 
the distribution of the corresponding random token conditioned on that the token is well defined 
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(namely, not equal to the empty set). We note that such a "conditioning" merely involves a 
normalization (namely, scaling) of each message so that it sums to 1 over all valid tokens. We 
will use X^^c to denote the message sent from to Vc — also referred to as a left message, 
Pc^i, to denote the message sent from to x^, — also referred to as a right message, and /i^ 
to denote the summary message at variable vertex It is then straight-forward to derive the 
message-update rule of PTP as follows, where the superscript "norm" on a message indicates 
that the message has been normalized. 



PTP Message-Update Rule 



{tb^v)b£C(v)\{c} 



tv-^c 1^ tb- 

b€Civ)\{c} 



(17) 



bec{v)\{c} 



E 



n 



.u(^V{c)\{v} 



n 



uev{c)\{v} 



E 



(tc^v)ceC{v) 



ty 



n 



n 

cec{v) 



norm/. N 



(19) 



and the normalized messages are defined as 



te(x*)<"> 



(20) 
(21) 
(22) 



We note that the update of messages in each PTP iteration is proceeded by first computing 
the un-normalized messages and then computing their normalized version. 
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D. SP as PTP 

We now show that SP is precisely PTP using the example of 3-COL problems. Here we note 
that it is possible (and entails little additional difficulty) to show the equivalence between PTP 
and the general formulation of non- weighted SP |fT9ll for arbitrary CSPs. However, as we feel it 
unnecessary to distract the readers with the additional statistical physics terminologies presented 
in [fT9l . we choose not to repeat the exposition of SP in [[T9l and only show that SP is PTP for 
the special case of 3-COL problems. 

In the factor graph representing a 3-COL problem, noting that each constraint vertex has 
degree 2, we will make a slight abuse of notation: for any [v — c) pair, we will use V{c) \ {v} to 
also denote the index of the unique other variable vertex (besides x^,) connecting to r^, although 
V{c) \ {v} originally refers to the singleton set containing that index. Whether V{c) \ {v} should 
be treated as the index of a variable or as the singleton set containing the index should be clear 
from the context. 

For notational simplicity, from here on, for every element in the token set (x*)^^^ when no 
ambiguity is resulted, we will suppress the subscript indicating the coordinate of the element. 
For example, we will write 12^, as 12, when the subscript can be recovered from the context. 
Additionally, we will use z, j, and k to denote the three distinct colors 1, 2, and 3 in the 3-COL 
problem, so that token i can refer to any token that is a singleton set, token ij can refer to any 
token that contains a pair of assignments, and token ijk refers to the token containing all three 
assignments. 

Using these notations, the PTP message-update rule for 3-COL problems can be easily derived, 
which is presented in the following lemma. 

Lemma 3: For 3-COL problems, the PTP message-update rule is: 
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A_,(i) := n (P6T(ij)+pnjr(ik)+pn"^(ijk))- n iPbZ:\i3)+PbZ:'m) 

beC(v)\{c} b€Civ)\{c} 

- n (Prr(ik)+pn."^(ijk))+ n Pn:r(ijk) (23) 

beC{v)\{c} b€Civ)\{c} 

A.^,(ij) := n (P6lT(ij) + Pnjr(ijk))- n Pnjr(ijk) (24) 

beC(?;)\{c} 6gC(d)\{c} 

A.^,(ijk) := n PnT(ijk) (25) 

bec{v)\{c} 

Pe^„(ij) := A-(;:f\^,„j_,,(k) (26) 

p,^.(ijk) := A",?(;7\^,„j_(ij) + A-,?f,y\^,^^^ (27) 

/^.(i) := n (Pc^T(ij)+Pc^?r(ik)+pnT(ijk))- J] (PcT(ij) + Pc^T(ijk)) 

cec(t>) c6C(?j) 

- n (PcT(ik)+P^T(ijk))+ n PcT(yk) (28) 
P.(ij) := n (p"T(ij)+PcT(ijk))- J] PcT(ijk) (29) 

/i.(ijk) := n PclT(ijk). (30) 

c€Civ) 

It is then possible to relate the PTP messages and the (non-weighted) SP messages for 3-COL 
problems, and show their equivalence. 

Theorem 1: For 3-COL problems, the correspondence between SP and PTP message-update 
rules is 



r ^ ^norm /• 



i=l,2,3 



vi - pr"(i) 



.norm /-N 



i=l,2,3 

(31) 

Proof: First we will identify c in the subscript of A^f^™ with in which v indexes the 

destination vertex in the subscript of 'qu-^v 
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For any c = {u, v}, let = AS^^(ij) + Aj;^^(ik) + AS^>^(jk) + AS^^(ijk). When applying 
PTP update equations (l26l) and (l27l) to equations (l23l) to (|25|) and re-writing the update rule in 
terms of left messages only, the un-normalized left messages are updated as follows. 

K^ciS) = n ^ ^V{b)\{u}^b{^)) - n {^V{b)\{u}^b{3) + «y(6)\{n},n) (32) 

beC(u)\{c} b€C{u)\{c} 

- n {^v'm{u}-*b(^^ + Oiv{b)\{u},u) + Y\. ^Vib)\{u},u 
b€C{u)\{c} beC(u)\{c} 

A«^c(ij) = Y\. {"^vibyxiuj-^bO^) + Oiv{b)\{u},u) - Y\. ^vib)\{u},u (33) 
feec(u)\{c} bec{u)\{c} 

«y(6)\{«},M- (34) 

6eC(u)\{c} 

After normalization, we have 



Ar™(i) 



' ^'l n (1 - A^°{fe)\W-bW) ~ n {^V{b)\{u}^bij) + OiV{b)\{u},u) 
^ \b&C{u)\{c} beC{u)\{c} 

- n {^V{b)\{ii}-.b(X) + 0(Vib)\{u},u) + Y\. ^Vib)\{u},u 1 (35) 
beC{u)\{c} beC{u)\{c} J 

A°™™(ij) = ■ I n (Av(6f\{«}^b(k) + avib)\{u},u) - Y\. (^v(b)\{u},u j (36) 

\feGC{«)\{c} f,GC{«)\{c} / 

Ari^(ijk) = i- n «v^W\W,- (37) 

^ beC{u)\{c} 

where /3 := EtG{x*)<"> ^^^^W- 
It is easy to see that 

i=l,2,3 beC(?i)\{c} i=l,2,3 b£C{u)\{c} 

+ Y\. ^V{b)\{u},u- 
beC(n)\{c} 

For any c = {u, v}, it is clear that when identifying A5J^™(i) with r/Jj^„ and identifying = 
1 - Ei=i,2,3 KZ'fuM^^) with r/*^^, the update rule for passed message {r]l^^ 
in SP is resulted. 

To prove the equivalence of PTP and SP summary messages, we can follow the same procedure 
as we did for proving the equivalence of PTP left messages and SP left messages. When applying 
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message update equations (|26l) and (|271) to equations ( |28] ) to ( |30l ) and re-write summary messages 
in terms of left messages, the PTP summary messages are updated as follows. 

/^.(i) = n (1 - ^r(cT\W-.c(i)) - n (^rw\W^c(j) + «v'{c)\w,«) (38) 

ceC(u) ceC(n) 

- n i^v! 

cgC(«) cGC{ji) 

= n {^"\{u}^c(^) + "y(c)\{«},«) - n "^(c)\W,« (39) 

cGC(ji) cGC{u) 

//^.(ijk) = JJ ay(c)\{n},n- (40) 

ceciu) 

After normalization, we have 



^ \c£C{u) cec(«) 



n 

y(cT\{«}^c(k) + 0:V(c)\{u},u) + JJ "V'{c)\{n},n 

3^ ■ I n ('^ncT\{«}-c(k) + ai/(c)\{«},«) - n o^v'(c)\{«},« 

\cGC{n) c&C{u) ^ 



(41) 



(42) 
(43) 



7^ ■ n «v^(c)\w,«> 

where p' := Ete{x*)<"> 
It is easy to show that 

i=l, 2, 3 ceC(u) i=l,2,3cGC(n) 

+ n "V'{c)\{«},«- 
cGC{n) 

For any m G V^, it is clear that when identifying with r]l^ and identifying 1 — 

X]i=i 2 3 with r/*, the update rule for summary message (?7^, rjl, rjl, rjl) in SP is resulted. 



This theorem suggests that for 3-COL problems, SP is PTP. Similar results can be shown 
for A;-SAT problems — instead of showing this result, we will in a later section, show a more 
general result, namely that weighted SP is weighted PTP for fc-SAT problems. It should be 
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convincing then that the general principle of designing SP algorithm for arbitrary CSPs is the 
recipe specified in the FTP message-update rule. 

In the correspondence between SP and PTP for 3-COL problems established in this theorem, 
it is worth noting that symbol i in the SP messages corresponds to the singleton token i that 
contains the single element i, and symbol * in the SP messages corresponds to the group of all 
non-singleton tokens. We note that the fact that all non-singleton tokens can be represented by a 
single symbol * is rather a coincidence, intrinsically related to the structure of 3-COL problems, 
and should not be understood as a general principle. Specifically, for 3-COL problems, each 
constraint vertex has degree 2, and as long as a non-singleton token is passed to a constraint 
vertex, the outgoing token from the constraint vertex will be token 123. It is precisely due 
to this fact that all non-singleton tokens can be represented by the same symbol — the joker 
symbol *, as is conventionally termed. This observation then implies that for general CSPs with 
non-binary alphabet, SP, or equivalently PTP, may be expected to contain more than one "joker" 
symbols, each corresponding to one or several non-singleton tokens. In other words, this suggests 
that the notion of "joker" symbol in SP messages is not a fundamental one, and that the rather 
fundamental perspective of SP is the extension of the variable alphabet to its power set with 
empty set excluded — or equivalently via a one-to-one correspondence, the set of all tokens 
associated with the variable. 

Finally, we remark that there can be a caveat on whether SP and PTP are exactly equivalent, 
when taking into account the decimation procedure associated with the SP algorithms. Specif- 
ically, we note that decimation is performed based on summary messages in SP. For 3-COL 
problems, each SP summary message contains "biases" on four different symbols, but each PTP 
summary message contains "biases" on seven different tokens. The natural decimation procedure 
for PTP is then to fix one "highly biased" variable to one of the seven tokens, rather than to 
one of the four symbols. Although it is not clear at this point whether this finer procedure 
may provide gains in algorithm performance, it nevertheless suggests that PTP is slightly more 
general than SP. Investigation on possible benefit of this slight generality can be an interesting 
direction of research. 
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E. Weighted PTP 

In the mechanism of passing random tokens that underlies the PTP message passing rule, the 
outgoing token sent from a variable vertex is a function of all incoming tokens from its upstream. 
A natural angle to generalize the dependency of these outgoing tokens on the incoming tokens 
is to generalize this functional dependency to a probabilistic dependency. Specifically, using the 
"intention-command" analogy, this probabilistic dependency will allow the intention of a variable, 
conditioned on all incoming commands from the upstream, to take any set of the values — not 
necessarily the maximal set — that obeys by the commands, and this probabilistic dependency 
is specified via the probability of each allowed intention. This result in what we call weighted 



In weighted PTP, we assume that the token t„_»c passed from variable vertex Xy to constraint 
vertex Tc may be any subset of the intersection of all incoming tokens passed to x„ except 
that passed from Tc, and the probability that token t^^c equals to each subset is specified via 
a non-negative function ujy{a\b) defined on (x*)^"^ ^ (ix*)^^^ U {0u}j for ^ach v E V. We 
will restrict ujv{0'\b) to an obedience conditional on {x*)^^\ the definition of which is given as 
follows. 



Definition 1 (Obedience Conditional): A non-negative function /i (a 1 6) on (x*)^^^ x ^(x*) U 
is said to be an obedience conditional on (x*)^^^ if ^(a|0^) = for all a G (x*)^^^ and h{a\b) = 
for any a,b e (x*)^^^ with a ^ b. 

First we note that in the definition, variable a in h{-) is intended to refer to an "intention", 
variable b is intended to refer to a "command", and function h is evaluated to zero if the command 
is null or if the intention does not obey the command. This is the reason for which we name 
such a function an "obedience" conditional. Second, it is also worth noting that an obedience 
conditional h as defined above is not a true conditional distribution, since it is not the case 
that '^h{a\b) = 1 for all b. However, it is a minor technicality to modify the definition of h 



(without impacting the development of any result in this paper) so that it is indeed a conditional 



PTP. 




a 
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distribution y. Thus for the purpose of this paper, one may always regard an obedience conditional 
as a conditional distribution of an intention given a command. 

Apparently, function [a = 6] is a special case of obedience conditional, characterizing a special 
functional dependency of intention a on command b, namely that the intention set a is exactly 
the command set b. 

We now give the precise message-update rule of weighted PTP where the only difference with 
PTP is in left message and summary message. 



Weighted PTP Message-Update Rule 



Pc^vifc—tv) 



E 



{tb^v)bSC(v)\{c} 



E 



n 

beC{v)\{c} 

Fc n 

\uev{c)\{v} 



b(^C{v)\{c} 



(44) 



n ^r-:c(^.^c)(45) 



uev{c)\{v} 



{ic — >f ) 



and the normalized messages are defined as 

\ norm/' J. N 



(46) 



(47) 
(48) 
(49) 



^Given an obedience conditional h, we may define a conditional distribution h{a\b). Let Z be max E /i(a|fe). 

6e{x*)f"> ae(x*)f"} 



Let non-negative function /!,(a|6) on (^(x*) U {0t,}j x (^(x*) U be defined as follows: /t(a|0„) := [a = 

h{9v\b) — 1 - E /i(a|&)/^ for all b / 0i,; and for all other (a, 6), h{a\b) ~ h{a\b)/Z. It is easy to see that h{a\b) is 

ae(x*)<"> 

a conditional distribution. Since eventually we will condition on that a 7^ 0, it is straight-forward to verify that the role of h is 
equivalent to h. 
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It is easily seen that weighted PTP is a family of algorithms, parametrized by a collection of 
obedience conditionals, {cu^ : v E V}, each for a coordinate. The fact that conditional distribution 
ujy{a\b) generalizes indicator function [a = b] immediately implies that weighted PTP generalizes 
PTP, as stated in the following lemma. 

Lemma 4: If ujy{a\b) := [a = b] for all v E V, then weighted PTP is PTP. 

F. Weighted PTP Generalizes Weighted SP 

Now we will show that the weighted SP developed for /c-SAT problems ifTSl is a special case 
of weighted PTP. That is, for A;-SAT problems, when setting functions {cu^ : v E V} in weighted 
PTP to a particular form, weighted SP, or SP*(7) is resulted. 

For a A;-SAT problem, let function Uv{a\b) for every i; G V" in weighted PTP be defined via 
a single real number 7 E [0, 1] as follows. 



ujy{a\b) :-- 



7, if a = 6 = 01 
1 - 7, if a C 6 = 01 

(50) 

1, if a = 6 ^ 01 



0, otherwise 

Lemma 5: Let {tUy : v E V} in /c-SAT be defined as in (l50l) . The message-update rule of 
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weighted PTP is then: 



beC(v)\{c} beCiv)\{c} 

A_,(i) := n iPbZ^w+Piz^m))-! n pnjr(oi) (52) 

bec{v)\{c} bec(v)\{c} 
A.^,(01) := 7 n Pn."(01) (53) 

bGC{v)\{c} 

p^^M ■■= [L.,c=o]- n KZ'^io)- n ^"-(1) (54) 

«ey(c)\{t)};L„,c=l «e\/(c)\{t)};L„,c=0 

p,_(l) := [L.,. = l]- n ^ri^(O)- n ^"-(1) (55) 

u&V{c)\{v}:Lu,c=l ueV{c)\{v}:Lu,c=0 

p,^.(01) := 1- n ^ric(O)- n ^"-(1) (56) 



ueV{c)\{v}: ueV{c)\{v}: 



f^viO) := n (PcT(0)+P^T(01))-7 n PcT(Ol) (57) 

cGC(d) c6C(?;) 

/x.(l) := n (p^T(l)+PcT(01))-7 n /'cT(Ol) (58) 

ceC{v) c£C{v) 

/x,(01) := 7 n /'cT(Ol). (59) 

ceciv) 

Proof: These update equations can be immediately obtained from weighted PTP message 
update equations (l44l) to (|46|) . where (l56l) follows from 



Pc->.(oi) = n (^r™(o) + Ar™(i) + Ar™(oi)) - n ^ri^(o)- n 

ttGy(c)\{t)} «eV(c)\{i,}: «eV(c)\{«}: 

1- n ^r™(o)- n 



uGV(c)\{^;}: uGV(c)\{t;}: 



Theorem 2: Let {u;^ : v E in a /c-SAT problem be defined as in (l50l) . Denote by 
(n™"^, n™"^, n™"^) the normalized version of SP message (n^^„ n^J^^, n^^J, namely 

that n™- = n^_,,/(n^^, + n-_ + n:^j, n™- = n-^,/ (n^^, + n-^, + n:^j, and 

nj;^°™ = Ul^J (ni^^ + n^^^ + n;^J. Then the correspondence between SP*(7) message- 
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update rule and weighted PTP message-update rule is 

n™ ^ [L,,, = 0] ■ A°™-(0) + [L,,,, = 1] ■ A---(l) (60) 

n™ ^ [L.,, = 0] ■ A5;--(l) + [L.,, = 1] ■ A^--(O) (61) 

n* norm , , \norm/'^1^ (f^^\ 

^ K^c (01) (62) 

^ pn-(0)+p"T(l) (63) 

C ^ (64) 

C' ^ (65) 

C ^ /i.(01). (66) 



Prior to proving the theorem, we will introduce some notations and a simple lemma which 
will be useful in the proof. For any neighboring variable vertex and constraint vertex Fc, 
we will denote by Lv.c the singleton token containing the single elementary assignment that 
assigns coordinate v the edge label Lj, c. Similarly, we will denote by Lv,c the singleton token 
containing the single elementary assignment that assigns coordinate v the negated edge label 
L^, c- With these notations, the following lemma immediately follows from Lemma [51 



Lemma 6: For any {v — c) pair in a A;-SAT problem, the right message p"™™ satisfies: 

P^T(Lv,c)+P^T(01) = 1 (67) 
p^!.T(Lv,c)+pn5r(01) = pTS'^m- (68) 

Now we are ready to prove Theorem [2l 



Proof: We will refer to the message correspondence in Equations (l60l) to (|62|) as the "left 
correspondence", the correspondence in (|63l) as the "right correspondence", and the correspon- 
dence in Equations (|64l) to (|66l ) as the "summary correspondence". 

We will prove the theorem by first showing that if the left correspondence holds, then the 
right correspondence holds, and conversely that if the right correspondence holds, then the left 
correspondence holds. This should prove that correspondence between SP*(7) and weighted 
PTP in their passed messages. We will then complete the proof by showing the summary 
correspondence. 

First suppose that the left correspondence holds, namely that H^JJ™™ = [L^^c = 0] ■ A^™™(0) + 
[L,,c = 1] ■ A^^--(l), n™- = [L,,, = 0] ■ A^-'^'(l) + [L,,, = 1] ■ A^--(O), and n;^-- = 
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In each iteration, by Lemma \5\ and the fact [L„ ^ = 1] + [Lv,c = 0] = 1 for every {v — c) pair, 
the right messages satisfy 

u(^V{c)\{v}:Lu.c=l u(^V{c)\{v}:Lu,c=0 

+[L„,,=i]- n ^rT(o)- n 

u&V{c)\{v}:Lu,c=l u£V{c)\{v}:Lu,c=0 

+1- n ^r™(o)- n 

ueV{c)\{v}:Lu.c=l ueV{c)\{v}:Lu,c=0 

= 1. 

That is, each right message pc^^ is already normalized, or pc^« = Pc^Jf • Then 

= [L.,c=o]- n ^r™(o)- n ^"-(1) 

«ey(c)\{i;}:Lu,c=l «ey(c)\{?)}:L„,c=0 

+[^.,c=i]- n ^ric(o)- n 

Mey(c)\{t)}:L„,c=l ueV(c)\{v}:Lu,c=0 

n ^r-:?(o)- n ^"-(1) 

u£V{c)\{v}:Lu,c=l ueV{c)\{v}:Lu,c=0 

n ([^'^.^ = 1] ■ (0) + [^.,c = 0] ■ AriT(i)) 
n ([^-.'^ = 1] ■ ^ric (0) + [^.,c = 0] ■ Ari^(i)) 

uev{c)\{v} 



(a) 



n n". 

u£V{c)\{v} 



norm 



(J TT K^c 

J- J- + n= + n* 

uevic)\{v} ^ '^^'^ ^ "^'^ 

where equality (a) is due to the assumed left correspondence, and equality (6) follows from the 
definition of Thus we have shown that if the left correspondence holds, then the right 

correspondence holds. 
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Now suppose that the right correspondence holds, namely that rjc^^, = Pc™™(0) + 
for every {v — c) pair. Following the PTP message-update equations (|5T1) to (l53l) . we have 

[L,,, = 0] ■ A,,^e(0) + = 1] ■ A,^,(l) 

= [L.,c = 0] ■ I n (/'n:r(o) + pn:r(oi)) - 7 n pn:r(oi) 

\6ec(t.)\{c} bec(»;)\{c} 

= 1] ■ I n (/^^^'^^(i) + pn."^(oi)) - 7 n Pn."^(01) 

\6eC(t.)\{c} b6C(^)\{c} 

= [L.,e = o]- n (pn:r(o)+pn:r(oi)) + [i^.,e = i]- n (p^TW+pn^rioi)) 

bec(?;)\{c} feec(u)\{c} 

-7 n pb^m) 

bec{v)\{c} 

[L.,c = o]- n (pnT(o)+pn?r(oi))- n pn"'(oi) 

+[L„,,=i]- n (pnjr(i)+P6T(oi))- n P6T(oi)-7 n pb^m) 

beCiiv) b€C^{v) beCiv)\{c} 

# [L„,, = o]- n pn."^(oi) + [L.,, = 1] ■ n pn"(oi)-7 n pn^r(oi) 
= n Pb^m-i n pn:r(oi) 

beC^iv) b€Civ)\{c} 

= n pnjr(oi)- (1-7 n pnjr(oi) 

= n (1 - /'^^T(o) - pn."^(i)) • 1 1 - 7 n (1 - /'^."^(o) - z^^- (1)) 



(c) 



n i^-vb^v)- 1-7 n (1-^^- 

bec-iv) \ beciiv) 



where equality (c) above is due to the assumed right correspondence. We will denote this result 
by (A). 
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Following very similar procedures, it can be shown that 

[L.,c = 0] ■ K^c{l) + [U,c = 1] ■ A„^,(0) 



norm 
Pb^v 



norm 
Pb^v 



bGCiiv) 



We will denote this result by (B). 
Similarly, 

wol) = 7 n (1 - pn^"(o) - P5T(i)) ■ n (1 - pn?r(o) - 

= n* . 

We will denote this result by (C). 

Combining results (A), (B) and (C), we have 

That is, the scaling constant for normalizing (At,^c(0), A^_^c(l)j A^_>c(01)) and that for nor- 
malizing (n^^c^ n^^^, n*^^) are identical. Then results (A), (B) and (C) respectively translate 
to 



[L.,c = 1] ■ A^--(l) 

[L.,c = 0] ■ a:;--(i) 



= 0] ■ Aj;r (0) = n 

[L,,c = I] ■ x:zm = ^: 

A^T(oi) = n: 



s norm 



At this point we have established the correspondence between the passed messages in weighted 
PTP and those in weighted SP. We now prove the summary correspondence. 
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Starting from Lemma [51 we have 

/^.(o) = n (PcT(o)+pn?r(oi))-7 n p-T(oi) 

cec(ti) cgC(D) 

n (PcT(o) + PeT(oi)) n (PcT(o) + PelT(oi))-7 n PcT(oi) 

ceci(i;) ceco(t;) cec(?;) 



|67l,l68l 



id) 



n p^T(oi)-7 n PcT(oi) 
1-7 n PcT(oi)) n PcT(oi) 
1-7 n (i-PcT(o)-p^^T(i))) n (i-Pc^:r(o)-p^^T(i)) 
1-7 n (1-^--) n (1-^--)) 



where (d) above is due to the right correspondence that we just proved. 
Symmetrically, it can be shown that 

p.(i) = (i-7 n (i-PcT(o)-PeT(i))) n (i-PcT(o)-pn5r(i)) 
1-7 n (1-^--) n (i-^-'^)) 

Sii 



Finally, it is straight-forward to see 

p.(oi) = 7 n (i-PcT(o)-PeT(i)) n (i-PeT(o)-pn?r(i)) 
= 7 n ~ n ~ ^^^'^^ 

= C- 
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This proves the summary correspondence and completes the proof. ■ 
This theorem asserts that weighted SP developed for A;-SAT problems is an instance of 
weighted PTP that we propose in this paper, or alternatively phrased, weighted PTP generalizes 
weighted SP from the context of A;-SAT problems to arbitrary CSPs with arbitrary variable 
alphabets. When specifying parameter 7 to be 1, this result immediately implies that non- 
weighted SP is non-weighted PTP for /c-SAT problems. 

Additionally, we note that in the correspondence between the summary messages of weighted 
PTP and weighted SP in the above theorem, it is clear that symbols 0, 1, and * in weighted SP 
(or SP) corresponds to tokens (sets) 0, 1 and 01 respectively. In addition, if we use notation 
Lv,c, we may re-write the correspondence between the left messages of weighted SP and those 
of weighted PTP in the above theorem as 

n^^c ^ A^^c(Lv,c) 
n:_ A,_,(oi) 

That is, symbols "s" and "u" in SP respectively correspond to singleton set Lv,c and Lv,c- These 
observations suggest that, although blurred by the addition of single symbol * to the variable 
alphabet, the true alphabet used as the support of SP messages is the set of all tokens associated 
with the variable, or equivalently, the power set of the original alphabet with the empty set 
removed. 

At this point, questions may naturally arise pertaining to what PTP and weighted PTP do 
towards the goal of solving a CSP Although rigorous question this question remains largely 
open at this point, we present some preliminary results in Appendix B. From Appendix B, 
intuitively one may view PTP or weighted PTP as essentially updating a random rectangle 
whose sides are independently distributed random variables; as PTP iterates, it drives some side 
of the random rectangle to being deterministically biased towards a singleton that contains the 
solution of the CSP. The reader is referred to Appendix B for more detailed exposition. 

VI. The Reduction of SP from BP 

At this point, we have identified SP with an equivalent but probabilistically interpretable 
algorithmic procedure, PTP, and generalized weighted SP from the special case of fc-SAT and 
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binary problems to arbitrary CSPs, in terms of weighted PTP. Now we are in the position to 
discuss the reduction of SP from BP, where we will refer to SP exclusively as PTP, and weighted 
SP exclusively as weighted PTP. 

As is well known, the derivation of the BP algorithm is based on a well-defined factoring 
function, or seen from a probabilistic perspective, a Markov random field (MRF). Thus, whether 
PTP or weighted PTP may be reduced from BP boils down to whether there is an MRF 
formulation on which the derived BP algorithm coincides with PTP or weighted PTP In |15|, 
an MRF is constructed for /c-SAT problem, on which BP reduces to what we now call weighted 
PTP. In [fTTl . similar results are shown using a different MRF formalism, where (generalized) 
states are introduced and the MRF is represented by a Forney graph or normal realization lUSl . 
Although in some sense, the normally realized MRF formalism of ifTTl is equivalent to the MRF 
of ifTSl . the Forney-graph formalism in ifTTl makes the development cleaner and more transparent, 
and the explicit introduction of states provides a better correspondence with the weighted PTP 
messages. 

In this section, we first generalize the MRF formalism, in the style of [ [151 or [fT7|. to arbitrary 
CSPs, and derive the corresponding BP algorithm. We then investigate whether the derived BP 
algorithm may be reduced to PTP or weighted PTP. We will begin this investigation with the 
special case of A;-SAT problems, and then proceed to the 3-COL problems and to general CSPs. 
Re-developing the results of [TS] and [[TtI for /c-SAT problems, we show that the BP algorithm on 
the normally realized MRF is readily reducible to weighted PTP as long as the BP messages are 
initialized to satisfying certain condition. We note that this condition, when satisfied in the first 
BP iteration, will necessarily be satisfied in later iterations in A;-SAT problems. Identifying the 
important role of this condition, we call this condition the state-decoupling condition. However, 
as we proceed to show, in 3-COL problems, it is impossible for the state-decoupling condition to 
hold true non-trivially across all BP iterations. Nevertheless, if one manually manipulate the BP 
messages to impose this condition in every iteration, which results in a modified BP message- 
update rule referred to as state-decoupled BP or SDBP in short, then the (SD)BP messages 
will still reduce to PTP. This on one hand justifies the role of the state-decoupling condition in 
BP-to-PTP reduction, and on the other hand suggests that for general CSPs, PTP (or SP) is not 
a special case of the BP algorithm. We then proceed further by investigating whether the state- 
decoupling condition is sufficient for BP to reduce to PTP or weighted PTP for general CSPs. 



39 



To that end, we show that yet another "local compatibility" condition concerning the structure 
of the CSP (in terms of the interaction between neighboring constraints) is required for SDBP 
to reduce to PTP or weighted PTP 

A. Normally Realized Markov Random Field 

Given a CSP represented by factor graph G, we now define its corresponding normally 
realized Markov random field G using a Forney graph representation |[T8l . We note that random 
variables involved in the probability mass function (PMF) represented by G are no longer those 
associated with factor graph (or equivalently MRP) G, but rather a new set of random variables, 
each distributed over the set of tokens associated with a coordinate. Additionally, as the central 
component of the Forney graph, another set of random variables, typically called generalized 
states or simply states, are also included. 

Specifically, as a graph, G can be constructed by adding a "half-edge" to each variable vertex 
of G. As a factor graph, G uses a different notation: edges and half edges are interpreted as 
"variables" and vertices are interpreted as local functions; a variable is an argument of the 
function if and only if the corresponding edge or half edge is incident on the corresponding 
vertex. We now define each variable and local function in G. 

• Each local function (or vertex) in G corresponding to variable vertex in G will be denoted 
by gvi'), and referred to as a left function. 

• Each local function (or vertex) in G corresponding to function vertex Tc will be denoted 
by fc{-), and referred to as a right function. 

• The half edge incident on represents variable referred to as a side, taking values from 



• The edge connecting left function gy and right function /c represents variable s^, c, referred 
to as a state, taking values from (x*)^"^ x (x*)^^^ ■ We will also write state s^^c as pair 
{^v,cj ^v,c) of left state s^^ and right state s^^. 

• Left function gy for v E V is defined as 



where Sy^civ) is the short-hand notation for (s^, c)cec(D) and Uy is an obedience conditional 




(69) 



on (x*) 
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• Right function fc for each c G C is defined as 

fc{svic),c) := n [<c = Fc(4{c)\M,c)]' (70) 

where sv{c),c is the short-hand notation for {s^^c)v£V{c)- 

• The global function represented by G is 

)-l[fc{svic),c), (71) 

where sy^c is the short-hand notation for {s„ c : \f{v — c)}. 

It is clear that upon normalization, function F may represent a PMF and the factorization of 
F encoded by G realizes an MRF. An example of such normally realized MRF, corresponding 
to the toy 3-SAT problem in Figure [B is given in Fig. HI 

Using the "intention-command" analogy, one may view that for any v, both and each left 
state stores the intention of variable x^, and that for any given c, each right state stores 
the command of constraint Tc sent to variable v. The intention of variable Xy depends on the 
intersection of all incoming commands probabilistically via the obedience conditional u^. The 
command of Fc sent to each variable x^ need to equal the forced token by the rectangle formed 
by the intentions from all other neighboring variables. 

We say that a configuration of (yy, sv,c) is valid under F if it is in the support of function F 
(namely, if it gives rise to a non-zero value of function F). Further, rectangle yy is said to be 
valid under F if there exists a configuration of sv,c such that (yy, sv,c) is valid under F. Then 
it immediately follows that the PMF represented by MRF G, upon marginalizing over states 
sv,c, characterizes the set of all valid rectangles under F (via the support of the marginal of F 
on yv). We now give an intuitive explanation of the MRF defining the distribution of rectangle 

yv- 

A simple property of such MRFs is given in the following lemma, which immediately follows 
from the definition of the left functions. 

Lemma 7: If configuration (yy, sv,c) is valid under F, then it holds for every (v — c) that 

Now we consider applying the BP message-update rule on the Forney graph G we just defined, 
where we will use pc^u (referred to as a right message) and \v-,c (referred to as a left message) 
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yi 



y2 



y-i 



yi 




Fig. 4. The Forney graph representing the normal reahzation of the toy problem in Figure [T] 



to denote the message passed from a right function fc to a left function g,, and the message 
passed from left function g^, to right function fc respectively, and use fi^ to denote the summary 
message at variable y^,. We note that both right message pc^^ and left message A^^c are functions 
on the state space (x*)^"^ x (x*)^"^- 

Lemma 8: The BP message-update rule on Forney graph G is: 



PviVv) 



E 



^v,C{v)\{c] 



n ^^^^ I n Pb^v{s^cy s^,b 

beciv) / beciv)\{c} 



(72) 



V(c)\{«},c 



ii,C{v) 



V{c)\{v},c)] Y\. '^u^c{s^,cy'^c{Sv{c)\{u},c)) O^) 
ueV(c)\{v} 



n n Pc^"(y-^^v,c 



(74) 



Before proving this lemma, it is useful to note the following elementary results. 
Lemma 9: 1) For any function (p. 



= z] = (j){x,z) 



2) For any collection of functions 0i, 02, • • • , 0m> 

n n 
Y W'P^(^^) =\YY'^'^^'^- 



(75) 



(76) 



1 = 1 Xi 
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We now prove Lemma [U 
Proof: 



b£C{v) J beC{v) beC(v)\{c} 



^[Sv,c = Vv] I 



b£C(v) 



"i.,C(i.)\{c} 



6ec(i.)\{c} 



= Vv] 1 

''«,C(^)\{c} 



n n J2 {Pb^visi^b, S^^b) ■ [SvM = Vv]) 



beCiv) I fe6C(w)\{c} ( 



„,C(^)\{c} 



n n Pb^v{yv,s^\b) 



beC(v) I beC{v)\{c} 



E 



%,C(«)\{c} 



()ec(«) / f)GC(D)\{c} 



Sv(c)\{^},cu£V{c) ueV{c)\{v} 

. uev{c)\{v} 



'V(c)\{„}, 



ueV{c)\{v} 



[*«,c — Fc(Sy(c)\{i;},c)] H ^^"^£(5^,01 Fc(Sv(c)\{ti},c))- 

«Gy(c)\{i'} 



°V(c)\{„} 



C3 



n n Kc=yv] n /^---(^c^j 

cec(u) / cec(t)) c6C(d) 



cec(t)) 



n n Pc^yiy^'^vj- 

cec{v) I cec(t)) 
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B. Weighted PTP as BP for k-SAT 

Now we show that for A;-SAT problems, weighted PTP is an instance of BP when the 
parametrization of weighted PTP is consistent with the parametrization of the normally realized 
MRF from which BP is derived. 

We begin with introducing a simplification of notations. For any [v — c) and edge label L^, c, 
we will write Lv,c as L, and Lv,c as L. This suppression of the dependency of Lv,c and Lv,c 
on their subscripts should not result in any ambiguity, when the context clearly indicates the 
subscript [v, c) or the edge to which the edge label L^, c refers. Additionally, for any v E V, we 
will write 01„ as *. Thus, each left or right state will take configurations from set {L,L, *}, 
where the interpretation of L and L depends on the edge with which the state is associated. For 
any given configuration of a state (s^^, s^J, we will suppress the comma between the left-state 
configuration and the right-state configuration. For example, state configurations (L, *), (L, *), 
(*, *) and (L, L) will be written respectively as L*, L*, ** and LL. 

Lemma 10: Let F be defined via (|69l) . (T/Ol) and (TtTI) . where each weighting function ujy is 
defined in (l50l) . If {yv, sv,c) is valid under F, then 

1) for every {v — c), it holds that s^;, 7^ L, Sv,c 7^ LL and that s^^c 7^ *L, and 

2) F{yv, sv,c) = 7"H.(3/v--v,c) . (1 _ ^)n.|,(yv,«v.c)^ where n,\,{yv, sv,c) and n.\,{yv, sv,c) are 
respectively the cardinalities of set {v E V : y^ = Clc&civ) ^v,c = *} and set {v E V : y^ C 
r\cec(v) ^v,c = *}■ 

Proof: For part 1, first we observe that sf ^ 7^ L, directly following from the definition of 
the right function (TTOl) . Then by Lemma |71 it is easy to see that s^ c 7^ LL and that s^ c 7^ *L. 
For part 2, we may proceed as follows. 



F^iVV, Sv,c) = Yl ^^'Civ)) ■ Yl fc{sV{c),c) 

161,1701 



n 

vev 




n n ['^^c — fc('5y(c)\{i,},c 
cGCveV{c) 



(a) 



Yl^^v \yv 



n 

cec(v) 



^",|,(j/v,sv.c) . _ ry''^n.\,{yv ,sv,c) 



44 



where equality (a) is due to the fact that {yy, sv,c) is valid under F, and equality (b) follows 
from the definition of the weighting function u {y^, flcecCij) -^^c) (l50l) . ■ 

The second part of this lemma, as a slight digression, suggests that the PMF under this MRF 
model is identical to that of ifTSl . since an equivalent result is shown for the MRF in lITSl . 
We note that the MRF in [[T5l serves as a combinatorial framework for the study of A;-SAT 
problems, which leads to further insights of SP for /c-SAT problems (the reader is referred to 
ifTSl for additional results). To a certain extent, one may expect that the normally realized MRF 
presented here may serve similar purposes for general CSPs. 

The first part of this lemma suggests that although each state takes on values from {L, L, *} x 
{L,L, *}, there are in fact only four possible state configurations that contribute to defining a 
valid rectangle. When applying the BP message-update rule on the Forney graph representa- 
tion of a A;-SAT problem, this implies that messages Xv^c, Pc^v and /i„ are all supported by 
{LL, L*, L*, **}. 

The BP message-update rule is given in Lemma [TTl which directly follows from equations 
dZll) to dUl). 

Lemma 11: The BP message-update rule applied on Forney graph G of a A;-SAT problem 
gives rise to: 
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A,^,(LL) := Yl Ph^^iL*) II (p,^,(LL)+p,^,(L*)) (77) 

A,^,(L*) := J] Pfe^.(L*) J] (pb_.,(L*)+p6^,(LL)) -7 J] pfe^„(L*) (78) 

A„^,(L*) := J] P6^.(L*) J] (p,^,(L*)+p5^,„(LL))-7 J] pb^,(L*) (79) 

X^^cM := 7 n Pb^vM (80) 

Pe^,(LL) := J] A„^e(L*) (81) 
uevic)\{v} 

pc^y(L*) := Yl (A„^c(L*) + A„^c(**) + A„^c(L*)) 
Mey(c)\{?;} 

+ ^ (A„^c(LL) - A„^c(L*) - K~.c{**)) Yl >^w~*c{'L*) 

u€V{c)\{v} weV{c)\{u,v} 

- Yl ^«-c(L*) (82) 

u£V{c)\{v} 

p,^,(L*) := Yl (A„^c(L*) + A„^e(**) + A„^e(L*)) - n ^«-c(L*) (83) 

Pc^.{**) := n (A«^e(L*) + A„^,(**) + A,^e(L*)) - J] Xu^ciL*) (84) 
«ey(c)\{i;} ttey(c)\{D} 

/^.(O) := n Pc-'vi^*)\ Yl (Pc^t.(LL)+Pc^.(L*))-7 Yl Pc^t.(L*) (85) 

/i,(l) := J] Pc^.(L*) n (Pc^«(LL)+Pc^.(L*))-7 H Pc-.(L*) (86) 

p«(*) — 7 n Pc^«(**)- (87) 

Now we are ready to investigate how these BP messages may reduced to (weighted) PTP 
messages. It turns out that the following condition has a special role to play in this reduction. 

Pc^^(L*) = Pc^,,(L*) = pc^y{**) (88) 
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Proposition 1: In A;-SAT problems, if the BP messages are initialized to satisfy condition (|88] ). 
then this condition is satisfied in every BP iteration. 

Proof: We only need to show that if (|88l) is satisfied during initialization, then it is satisfied 
in the first iteration after initialization. - In fact, noting that pc^t,(L*) = Pc^i,{**) necessarily 
holds in each BP iteration due to (f83l) and (|84|) . we only need to prove that pc^y(L*) = pc^^ih*) 
holds in the first iteration provided BP messages are initialized to satisfy (l88l) . 

Under this initialization condition, we have, in the first BP iteration after, 

A„^c(L*) + A,„_c(**) = JJ pb^y(L*) X I Yl (p6^i,(L*) +Pfe^«(LL)) - 7 JJ pfe_„(L*) 

+7 n Pb^vi**) 

-7 n Pb^vij^*) n Pb^vi'L*)+-f Yi pb^vi**) 

b£C^{v) b£Ci{v) beC^(v)UCi{v) 

Yi Pb->.{L*) n 

(L*) + pb_^(LL)) 

b€C-{v) beCiiv) 

= At,_»c(LL), 

where equality (a) is due to the initialization condition (l88l) . 
Then in the subsequent update of the right messages, we have 



Pc^„(L>k) = Y[ (A„^c(L*) + K^c{**) + A„^c(L*)) 

ueV{c)\{v} 

+ Yl (A„^c(LL) - A„^e(L*) - A„^,(**)) Yl ^^-c(L*) 
uev{c)\{v} wev{c)\{u,v} 

- Yl Xu^c{L*) 

uev{c)\{v} 

uev{c}\{v} uevic)\{v} 
= Pc^j,(L*), 

where equality (b) is due to the above result At,_,c(LL) = A^_>c(L*) + A^_>c(**)- 
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Theorem 3: In a A;-SAT problem, suppose that the following two conditions are imposed in 
the BP messages. 

1) For every {v — c), the BP messages are initialized such that (l88l) is satisfied. 

2) In each BP iteration, A^^^ is scaled to A^^'^ such that A^^'^(L*)+A^™^(L*)+A^^^(**) = 



1, before 



the 



iP Iteration, A^^^ is scaled to A'„'^'^" such that A'„'^'^"(L*)+. 
it is passed along the edge; that is, A^™™(s^^, s^J) := 
for every (s^^, s^^) in the support of A^^c and the right messagt 

;es, namely. 



Pc—fV 



normalized left messagi 
;LL) := n ^^-(L* 



;es are updated based on 



(89) 



u€Vic)\{v} 

:= n (^r™(L*) + + Ar™(L*)) 

-:V{c)\{u,v} 

(90) 



n 

uev{c)\{v} 



Pc- 



Ari^(L*) (91) 



u€Vic)\{v} 

uevic)\{v} 

Aric(L*) 

(L*) := n (Ar™(L*) + Ari^M + Aric (L*)) - H 

ueVic)\{v} ueVic)\{v} 

Pc^.{**) := n (Ar™(L*) + Aric (**) + Aric (L*)) - n Ari^(L*).(92) 

«ey(c)\{i>} uevic)\{v} 

Then the correspondence between BP messages and weighted PTP messages is 

^norm(BP)(L,) ^ [L,,, = 0] ■ AJJ^-^P^P) (0) + [L,,, = 1] ■ A^^-^P^P) (1 
^norm(BP)^L*) ^ [L,,, = 0] ■ AJ^T^^^^'H 1) + [^.,c = 1] " A^^I^^^^^) (0 



a: 



_)^norm(PTP)^^^^J 



pr(o) - pr^Ho) 



pr^Hi) 



pr(i) < 
pr(*) - pr^n* 



(93) 
(94) 
(95) 
(96) 
(97) 
(98) 
(99) 
(100) 



Proof: 
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Note that based on Proposition [H condition pc^J{L*) = pi^J(L*) = pi^J^**) holds in 
every BP iteration. From the proof of Proposition [U it also holds in every BP iteration that 

_^norm(BP)^L*) + A^J^^^^^^^ (**) = KZT^''''\'i^^) ■ (101) 

Now we will prove this theorem by first proving that the "left correspondence" ( (|93] ) to (|95l) ) 
implies the "right correspondence" ( (|96l) and (|97l) ) and conversely that the "right correspondence" 
implies the "left correspondence", whereby proving the correspondence in the passed messages. 
We then prove the summary correspondence ( (l98l) to (1 1001) ). 

First suppose that left correspondence holds, namely that Xv'Z^^^^\Ij*) = [Lt,c = 0] ■ 

X norm(PTP) /^x , rr il \ norm(PTP) / , \ -i norm(BP) /-F \ rr nl \ norm(PTP) \ . ri- 

Xy^c ^ (0) + [Li,,c = 1] ■ Xv->c ^ '{1), Xv->c ^ '[L*] = [L^,c = 0] ■ Xv~.c ^ (1) + [Lv,c = 
l]-Xv'!^T^^"^^\o), and Xv'ZT^^^\**) = A°"™''^^^^(*). Following PTP message-updating equations 
(|54|) to (I56l), we have 



^no..(PTP)(o) ^ (^^^)(l)^=Vf_J:)(0)+pfT)(l) 

uGV{c)\{v}:Lu,c=l ueV{c)\{v}:Lu,c=0 
«Gy(c)\{«}:L„,c=l uGVic)\{v}:Lu,c=0 

n ^r^^'^'Ho) n ^ri^^'^'Hi) 

u£V{c)\{v}:Lu,c=l ueV{c)\{v}:Lu,c=0 

n ([^-^ = 0] ■ Kzf''^'^^) + = 1] ■ A— (p^^Ho)) 

n ([^-^ = 0] ■ + [^«,e = 1] ■ A— ^P^^HO)) 

= n ([^-^ = 0] ■ ^ri;?^"^"^!) + [Lu,c = 1] ■ KZ^^^^HO)) 

u€Vic)\{v} 
(3 -Q ;^norm{BP)^L*) 

uevic)\{v} 

where equality (a) is due to the fact that p"^™^^^^'' = Pc^^'' as is shown in the proof of Theorem 
[2l equality (b) is due to the assumed left correspondence. 
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Similarly, we have 



= 1- n A— ('^p)(o) n Kz-^^^^Hi) 

uGV{c)\{v}:Lu,c=1 ueV{c)\{v}:Lu,c=0 

= 1- n ^ric^^^^L*) 

u£Vic)\{v} 

where equality (c) is due to the fact that Xu°^^^^\ij*) + Au^™*-^^''(L*) + Xu°-^^^^\**) = 1. 

Thus we proved that if the left correspondence holds, then the right correspondence holds. 

Now suppose that the right correspondence holds, namely that pi^''(L*) = pcZT^^'^^\*) , 
and pi?fJ(LL) = pT-:f''^''\o) + p^^T^^^^^(l). We then have 

= 1. 



Following PTP message-update equations (ISTI) to (l53l) . we have 
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[L., = 0].X^jPiO) + [L,,, = l].X^l^Pil) 

\beC{v)\{c} b(iC{v)\{c} 

I rr 11 I TT /■ norm(PTP)/^N , nomi{PTP) / nn TT norm{PTP) / n 

+ [Lv,c=l]-\ [[ (pfe^/ nl)+P6^/ (*))-7 11 Pb^v (*) 

\beC(v)\{c} b&Civ)\{c} 
rr nl TT ^ norm(PTP) /^.N , norm (PTP ) / nn 

beC(«;)\{c} 

, rr 11 TT / norm(PTP) n , norm{PTP) / nn TT norm{PTP) / n 

+ [L^,c=M- 11 (Pb^/ '(1)+P6^/ (*))-7 11 Pb^v (*) 

f)ec(?;)\{c} beciv)\{c} 

rr nl TT / norm{PTP)/„N , norm(PTP)/ nn TT norm{PTP) / x 

[Lv,c = 0]- 11 (Pfe^/ HO)+Pb^/ '(*))■ 11 Pfe^/ H*) 

b&Ci(v) b€C-{v) 
I rr 11 TT / norm(PTP) X , norm(PTP)/ xx TT normfPTP) / x 

nnomi(PTP)/ X 
Pb^v (*) 

feGC(i;)\{c} 

1^21 r r nl TT norm(PTP) / x , r r 1 1 TT norm(PTP) / x TT no 



b£C^(v) beC^{v) b€C{v)\{c} 

nnomi(PTP)/ X TT norm(PTP)/ x 

Pb^v '(*)-7 11 Pb^/ '{*) 

b€C-{v) b€C{v)\{c} 



nnorm(PTP)/ x I , TT n 

Pb^/ (*) 1-7 11 Pb- 

b€Ciiv) \ b&div) 
( 



id) 



(/) 



b^Cliy) 



n "S-al*) 1-7 n "i-jcLH 



n "Kd*)! n ui':?(L»)+p£':'(LL))-o^ n "^-al*) 



where equality (c?) is due to the assumed right correspondence, equality (e) is due to the fact that 
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pi?fJ(L*) +pi^^J(LL) = 1, and equality (/) is due to that the condition p[^^^(L*) = p[^^^(L*) 
is satisfied in every iteration. We will denote this result by (A). 
Similarly, we have 

[L.,c = 0] ■ AfJ/)(l) + [L,. = 1] . A^Jf )(0) 



rr nl I TT f norm(PTP) N , norm{PTP) / nn "rT n 

[Lv,c = 0]-\ [[ (Pfe^/ ^(l)+pf,^/ '(*))- 7 11 Pb- 

,6GC(i.)\{c} beC{v)\{c} 



, rr 111 TT / norm(PTP)/r>N , normfPTP) / xx TT normfPTP) , 

+ [Lv,c = l]-\ 11 (Pb^/ nO)+P;,^/ n*))-7 11 Pft-^/ 



,beC(v)\{c} b€C{v)\{c} 



We will denote this result by {B). 
Finally, we have 



Arjr(*) = ^ u pb 

b^C{v)\{c} 



norm(PTP) / X 



= 7 n P^^-^vi**) 
b&Civ)\{c} 

= Ap_r)(**). 

We will denote this result by (C). 

Combining results of (A), (B) and (C), we have 

AfJ/Ho) + XHPil) + ArJ/n*) = Afi)(L*) + Ari)(L*) + All^^l**). 

That is, the scaling constant for normalizing (aI^c^^(O), X^J:c \i), a1'lJ/'^(*)) and that for nor- 
malizing {X^} (L*) , X^i^} (L*) , {**)) are identical. Therefore, result (A), (B) and (C) 
respectively translate to 

[L,,, = 0]-Aj;!^^(PTP)(O) + [^.,c = l]-An"^PT''ni) = A^!^^(^P)(L*) 
[i:.,c = 0]-A:;T^PT'"ni) + [^.,c = l]-A:;--(PTP)(O) = A^!!-(^P)(L*) 

^norm(PTP)^^) = A^^^^^^^^ (**) • 

At this point we have proved the correspondence between the passed messages in BP and 
those in weighted PTP. 
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We now prove the summary correspondence. Following the PTP message-update equations 
57]) to dSll), we have 



cGCi(i>) cec°iv) 

-7 n PcT^^^^H*) 

cec(ii) 

mM Yi p°T^^^^n*)-7 n PcT^™(*) 

n n (pf-ri(LL)+pi^'i(L*)-7 n /^^'-^'^(l^ 

= pr(o). 

Following a similar procedure, we have 

pr^Hi) = n (PcT^'^"ni)+PcT^"^^H*))-7 n PeT^'^'H*) 

ceC(«;) cGC(?;) 

= n pi"-^:nL*)( n (p£'j(LL)+p(!f)(L*)-7 n p^'^hl*) 

= pr(i). 

Finally, we have 

pr^H*) = 7 n PcT^'^'H*) 

CGC(?;) 

= 7 n /'^"'n**) 

cec(t;) 

= pr(*), 



which proves the summary correspondence. 
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C. State-Decoupled BP 

In this subsection, we will consider reducing PTP from BP for 3-COL problems, where we 
only focus on the non- weighted version of PTP, namely that each weighting function uj^, is 
defined as 

u^{a\b) := [a = b]. (102) 

This gives the form of BP messages in the form specified in the following lemma, easily 
obtainable from BP update equations (1721) to (1741) . 

Lemma 12: The BP message-update rule for 3-COL problems is as follow: 



K^c{i, ij) := JJ {pb-.v{h ij) + pb-.v{h ik) + pb-.v{h ijk)) 

b€Civ)\{c} 

- Yl (P6^t,(i,ij) +Pfe-.t,(i,ijk)) (103) 

beciv)\{c} 

A^_c(i, ijk) := Yl (Pb^vih ij) + pb-.v{h ik) + pb^vih ijk)) 

b€Civ)\{c} 

- n (Pfe--t.(i,ij) +Pfe^t.(i,ijk)) 

bec{v)\{c} 

- Yl (P6-^-(i,ik) + Pb-.,,(i,ijk)) + JJ pb^„(i,ijk) (104) 

beCiv)\{c} bGCiv)\{c} 

A^,_c(ij,ij) := JJ (pfe^^(ij,ij) +p6^^(ij,ijk)) (105) 

beciv)\{c} 

A^^c(ij,ijk) := n 

^i'(ij) ij) ^" Ph^v 

(ij,ijk))- n 

Pb->i>(ij, ijk) (106) 

beCiv)\{c} beC{v)\{c} 

A^^c(ijk, ijk) := JJ pb^^(ijk, ijk) (107) 

b€C(v)\{c} 
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Pc-.i,(i,ij) := Av/(c)\{4_c(k,jk) (108) 
ijk) := Av/(c)\{4^c(jk,jk) (109) 
Pc^„(ij,ij) := Ay(c)\{^,}-.c(k, ijk) (110) 

Pc->^(ij, ijk) := Xv{c)\{v}^cii], ijk) + Xv{c)\{v}^ciiK ijk) + Xv(c)\{v}->c{ii^, ijk) 

+Ay(c)\M-.c(ijk,ijk) (111) 
Pc^^,(ijk, ijk) := Xv{c)\{v}~^ciij, ijk) + Ay(c)\{„}^c(jk, ijk) + Ay(c)\{t,}„.c(ik, ijk) 

+Ay(c)\M-*c(ijk,ijk) (112) 

Pv{i) ■= JJ {pc^vih ij) + pc^vih ik) + Pc-.j,(i, ijk)) 
ceciv) 

- n (Pc^"(i>ij) + Pc^"(i>ijk)) 

- Y[ (pe^,„(i,ik) +p,_„(i,ijk)) + Y[ pe^,(i,ijk) (113) 

c£C{v) ceC(v) 

P-viij) ■= JJ (pc^»,(ij,ij) +Pc^t,(ij,ijk)) - JJ pc^^(ij,ijk) (114) 

cgC(i>) cec{v) 

p^(ijk) := Yl Pc^i,(ijk,ijk). (115) 

cec(i>) 

Before we begin to consider the BP-to-PTP reduction for 3-COL problems, it is helpful to 
take a closer look at the BP-to-PTP reduction mechanism for A;-SAT problems. 

In Theorem [3l one may notice the two conditions governing the BP-to-PTP reduction for k- 
SAT problems, namely, the initialization condition and the normalization condition. It is arguable 
that the normalization condition imposed on the BP messages, although serving to simplify the 
form of BP messages and possibly to alter the interpretation of the messages, does not have a 
critical impact on the message-passing dynamics. This is because the normalization condition 
merely involves a scaling operation, without which BP messages and PTP messages for /c-SAT 
would still be equivalent up to a scaling factor. On the other hand, the initialization condition in 
Theorem [3] plays an important role on the message-passing dynamics. In essence, the initialization 
condition assures that any right message depends only on the right state it involves. Using the 
"intention-command" analogy, in which one views each right state as storing the "command" 
sent from a constraint and each left state as storing the "intention" of a variable, this condition 



55 



simply restricts that the distribution of the command sent to any variable does not depend on the 
intention of the variable. It is remarkable that this interpretation of the initialization condition 
in Theorem [3] (or (|88l) ) is consistent with the PTP message-passing rule, in which any right 
message (i.e., outgoing distribution of command) sent to a variable is independent of (or, not 
a function of,) the incoming intention from that variable. This is however not the case for the 
right messages of BP in general. 

We are then motivated to formalize this condition for general CSPs as what we call the "state- 
decoupling" condition and impose it on the right messages of BP, so as to achieve a consistency 
with PTP. It is intuitively sensible that such a consistency is needed in the reduction of PTP 
from BP 

Definition 2 (State-Decoupling Condition): For an arbitrary CSP and at any given iteration, 
the BP messages based on the MRF formalism defined by (l69l) . (iTOl) . and (TtTI) are said to 
satisfy the state-decoupling condition if for every [v — c), the right message Pc-»d(si,,c) is only 
a function of the right state s^^, namely, if for any fixed s^^ G (x*)^^^ and any s^^ C s^^. 

It is clear that the initialization condition for BP-to-PTP reduction for A;-SAT in Theorem [3] is 
equivalent to this condition, where we note that the condition in Theorem [3] only puts restrictions 
on the right messages with right state equal to *, since for the remaining case with right state 
equal to L this condition is trivially satisfied. 

It is interesting to observe, as shown in Proposition [H that for /c-SAT problems, as long as 
the state-decoupling condition is imposed in the initialization of the BP messages, the condition 
is preserved in every iteration. This serves as the basis for BP to reduce to PTP as shown in 
Theorem [3] and its proof. For 3-COL problems, however, the corresponding result to Proposition 
[T] does not hold. 

Lemma 13: For 3-COL problems, if the state-decoupling condition holds for BP messages 
both in iteration I and in iteration / + 1, then the right message in iteration / must satisfy for 
every {v — c) 

p,^„(s^,s^) = 

as long as right state ^ 123. 
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Proof: In 3-COL problems, the state-decoupling condition can be expressed as 

= Pc-.i;(ij,ij) 
Pc^„(i, ijk) = Pc^^,(ij, ijk) = pc^»,(ijk, ijk). 

Note that we only need to prove the Lemma for being a pair of assignments, since when 
is a singleton, all right messages equal by the construction of the MRF and Lemma [T2l 
describing the BP message-update rule for 3-COL. 

In iteration / + 1, following 3-COL message-update equations (|103|) to (I112|) and using a 
superscript to denote the iteration number, we have 



Wic)\{v}^c 

n 

h(iC(V{c)\{v})\{c} 



n (pEy(c)\{.} (k, jk) + pily (,)\{,} (k, ik) + pl'iv'(c)\{.} (k, ijk) 
{^})\{c} 

n (P?iy(c)\{.}(kjk) +p£^(,)\{,^(k,ijk)) , (116) 



h&C{V(c)\{v})\{c} 

pi'-i^;(ij,ij) = A(^|,^))^,j_(k,ijk) 

n (P?iy(c)\{.} (k Jk) + pily (,)\|^} (k, ik) + p[lv^(,)\{,} (k, ijk) 
bec;(y(c)\{^})\{c} 



n [pLv{c)\{v} (k, ik) + P?iv'(c)\{.} (k, ijk) 
bec(v{c)\{v})\{c] 

n (p?iy (k, jk) + p^ly (,)\|,| (k, ijk) 



h&C(V(c)\{v})\{c} 

+ n Pi%c)\w(k,ijk). (117) 

b&C{V{c)\{v})\{c} 

Now suppose that the state-decoupling condition as expressed above can be satisfied both in 
iteration / and in iteration / + 1. Then we may equate the right-hand sides of (|1 161) and (|1 17h . 
namely. 
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n {Pblv{c)\{v} (k, jk) + PhlvicMv} (k, ik) + Pblv(c)\{v} (k, ijk) 

{v})\{c} 

n (Pbiy(c)\M(kJk) + p['i^(^)^^^,j(k, ijk)) 

beC{V{c)\{v})\{c} 

n {pblvic)\{v} (k, jk) + Pblvic)\M (k, ik) + pfiv'(c)\{.} (k, ijk) 



beC(V{c)\{v})\{c} 



b&C{Vic)\{v})\{c} 

n {Phlv(c)\{v} (k, ik) + Pbiy(e)\{.} (k, ijk)^ 

b€C(V{c)\{v})\{c} 

n (p6iy(c)\M(k,jk) + p['^^(^)^{^j(k,ijk)) + Yl Pblv{c)\{v}i'^ 

b€C(V(c)\{v})\{c} b&C{V{c)\{v})\{c} 

which implies 

n [pblv{c)\{v} (k, jk) + Pblv(c)\{v} (k, ijk) 
beciv{c)\{v})\{c} 

n {PblvicMv} (k, ik) + Pblv{c)\{v} (k, ijk) 
b6C(y(c)\{4)\{c} 

+ n {Pblvic)\{v}0^^i^) + Pblvic)\{v}0^^^i^)) - n P?iy(c)\w(k 

6GC(y(c)\{t.})\{c} beC{V{c)\{v})\{c} 

Since every right message must be non-negative, when the state-decoupling condition is satisfied 
in iteration /, the only way to make the above equality hold is the case where 

.(0 



Pbiv{c)\{v}i^^ ik) = 0. 

.y(c)\w> 



Under the state-decoupling condition, this also means p['^^(^^^\^|^|(ik, ik) = 0. Thus we 
establish this lemma. 



This lemma suggests that when the BP messages satisfy the state-decoupling condition in two 
consecutive iterations, then the right messages must take a trivial form — equal to [s^ = 123] 
up to scale, and contain no information. 

At this point, one is left with either the option of concluding that PTP (or SP) is not an 
instance of BP for 3-COL problems (and hence for general CSPs) or the option of doubting 
the usefulness of the state-decoupling condition in BP-to-SP reduction. In the remainder of this 
subsection, we will clear this doubt and assert the usefulness of the state-decoupling condition 
by showing that when the state-decoupling condition is manually imposed on the BP messages 
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in each iteration, BP still reduces to PTP for 3-COL problems. That will allow us to conclude 
that PTP (or SP) is not a special case of BP 

To force the state-decoupling condition to be satisfied in each BP iteration, now we modify the 
message-passing rule of BP on the Forney graph representation of general CSPs, and introduce 
a "new" message-passing procedure which we refer to as the state-decoupled BP or SDBP 
We note that introducing this "new" message-passing procedure is solely for the purpose of 
verifying the usefulness of the state-decoupling condition and hopefully arriving at a unified 
reduction mechanism for PTP to reduce from BP (or more precisely from SDBP). Beyond this 
purpose, we have no intention to justify the introduction of SDBP 

Identical to BP at local function vertices, SDBP differs from BP in that messages passed 
from the right functions need an additional processing (so that the state-decoupling condition 
is satisfied) before they are passed to the left functions. In SDBP, there are three kinds of 
messages: right message pc->v is computed at right function fc to pass along the edge to g^,; 
state-decoupled right message p*_>^ is computed at the edge connecting fc and g^, which satisfies 
the state-decoupling condition, computed only based on the right message pc->v on the same edge 
and to be passed to left function g^,\ left message \,^c is computed at the left function g^ to 
pass along the edge connecting to fc- The precise definition of SDBP message-update rule is 
given next. 

Definition 3: The SDBP message-update rule is defined as follows. 



n <b ■ n (118) 

beC(D) / heC{v)\{c} 

Pc^viSy c: S^,c) '■= ^ [^v,c = '^c{Sy(^c)\{v},c)] Y\. '^"^c('5^,c5 ^ c(Sy(c)\{u},c)) (120) 
4(c)\{„},c ueVic)\{v} 



PviVv) ■= ^ Vv 



n <c n pi^vKc) (121) 

cec{v) J cec{v) 



where 6 = 1/ Esfl^e(x*){"> P^-*^^^v,c, s^,c)- 

Comparing this definition with the BP message-update rule in Lemma [8l the following remarks 
are in order. First, the expression of right messages p in terms of left messages A is identical 
to that in BP. Second, each state-decoupled message p*^^, may be regarded as a function of 
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(s^^, s^J but the value of the function only depends the s^^ component, namely that the (state- 
decoupled) right message satisfies the state-decoupling condition. Furthermore, the expression 
of A in terms of p* is precisely the same as the expression of A in terms of p in BF^. 

Following this definition, the next lemma summarizes the SDBP message-update rule for 
3-COL problems. 

Lemma 14: Let {cuy : v E V} in 3-COL problems be defined as in (11021) . The SDBP message- 
update rule is then : 



b&C{v)\{c} 

- n (PU(ij)+PU(ijk)) (122) 

bec{v)\{c} 

A„^e(i,ijk) := n (P6--.(ij) + P6%(ik)+P6%(ijk)) - J] (pUvi^i) + PUviu^)) 

bec{v)\{c} bec(v)\{c} 

- n (pU(ik) + P^^.(ijk))+ n PU(ijk) (123) 

beciv)\{c} bec{v)\{c} 

A.^c(ij,ij) := n (pU(ij) + PU(ijk)) (124) 

bec{v)\{c} 

A.^e(ij,ijk) := n (pU.(ij)+P^.(ijk))- J] P^^"(yk) (125) 

beC(v)\{c} b€Civ)\{c} 

A,^,(ijk,ijk) := H pU^iijk) (126) 

b€Civ)\{c} 

P*^„(ij) := 5- Ay(c)\{^}^c(k,ijk) (127) 
p*^„(ijk) := 5 ■ {Xv{c)\{v}-,ciii, ijk) + Xv(c)\{v}-.ciiK ijk) + Xv{c)\{v}^ci]K ijk) 

+Ay(c)\w^c(ijk, ijk)) (128) 



^Although it is possible to formulate SDBP in more compact form by, for example, suppressing p and expressing the message- 
update rule only using p* and A, we feel the current way of formulating SDBP makes it easier to compare SDBP with BP. 
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ceC(v) c£C{v) 

- n (pU(ik)+p:^,(ijk))+ n PU(ijk) (129) 

P.(ij) := n n PU(yk) (130) 

/i,(ijk) := H pU(ijk), (131) 

c€C(v) 

where S is such that 

Pc^Jijk) + J^p*^^(ij) = 1. 

ij 

It is now possible to establish a correspondence between PTP and SDBP messages for 3-COL 
problems. 

Theorem 4: For 3-COL problems, the correspondence between PTP and SDBP message- 
update rules is 







AiirHi,ijk) 


(132) 






AiirHij,ijk) 


(133) 


Arjp(ijk) 


<— > 


Ai™(ijk,ijk) 


(134) 






p:i!rHij) 


(135) 






p:^!rHijk) 


(136) 








(137) 


pr^Hij) 




pr^^Hij) 


(138) 


pr^Hijk) 


<— > 


pr^^Hijk). 


(139) 



Proof: We will first prove that if the "right correspondence" (namely that (11351) and (11361) ) 
holds, then the "left correspondence" (namely that (|132l) to (11341) ) holds. 

Suppose that the right correspondence holds (where the symbol ^ in (11351) and (|136|) is 
understood as equality). Then 
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x^r (i, ijk) = n {pi'T'' (ij) + pi'T"' (ik) + PiT'' (ijk)) 

bec{v)\{c} 

- n (p£r'(ij)+ftSr'(ijk); 

feGC(i;)\{c} 



n (p£r'(ik)+p:sr'(ijk))H- n p:sr'(ijk) 

7(^;)\{c} ;>eCW\{c} 

n {pzr^"w + p-T'-^'cik) + p-T'-^'djk); 



6eC(t;)\{c} 

n/ norm(PTP) /..X , norm(PTP) / . ., 
nij)+Pb^/ '(ijk)^ 

beciv)\{c} 

n( norm(PTP) /., n , norm(PTP) /. -i n\ , TT norm(PTP) /. n 

(^Pb^/ ^(ik)+p^_/ '(ijk)j+ _[_[ Pfe^/ Hijk) 

6gC('!;)\{c} bGC(?;)\{c} 

Similarly, we can prove that Al^c^^^(ij, ijk) = Al^/^(ij) and Ai^c^^''(ijk, ijk) = Al'-J/^(ijk). 
It then follows that the left correspondence holds. 

Now we prove that if the left correspondence holds, then the right correspondence holds. 
Suppose that the left correspondence holds, then we have 

= a(^.A;r-',„,^,(k)) 
= ■ Ai,^™2)-ik, ijk) 

where a = 1/ Y.im„-)' pf^\t) and /3 = 1/ Eted-jU'iM") 

Since both pc^^^^ and p"^™^^^^'' are normalized, it must hold that a/3 = b. This in- 
dicates that pc^™*^^^^''(ij) = p*^?^^''(ij). Following a similar procedure, one can show that 
^norm(PTP)j^.jj^-j = pc^°^^^ (ijk) . TWs implics that the right correspondence holds. 

At this point, we have established the correspondence between passed messages in PTP and 
those in SDBP. 
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Now we will prove the summary correspondence (namely, that (11371) to (11391) ). 



- n (p:^°""nij) + p:i!r"Hijk)) 

- n (p:^rnik)+p:i!rnijk))+ n p:^?"'Hijk) 

= n (PcT^^'^^'Hij) + pn:r^''''''nik) + p^T^^'^^'Hyk) 

ceciv) 

- n (PcT^''''nij) + PeT^"^"Hijk)) 

cGC(d) 

- n ipT-lf^'^'^H^) + p:zf''^''\m + n PcT^^'^'Hyk) 

c£C(v) ceC{v) 

= pr^Hi)- 

Similarly, we can prove that //[^^^^^(ij) = /ii^^^^(ij) and /il^°^^^(ijk) = /il^^^^(ijk). This 
proves the summary correspondence. 

■ 

At this end, it should be convincing that the state-decoupling condition is an important 
ingredient in the reduction of BP to PTP. It is worth noting that in the case of k-SAT problems, 
this condition can be imposed simply by the initialization of BP messages. However in the case 
of 3-COL problems, one needs to manually impose this condition at each iteration, namely, 
carrying out SDBP instead of BP, so as to arrive at an equivalence to PTP messages. This extra 
complexity involved in 3-COL problems then suggests that for 3-COL problems, PTP and hence 
SP are not a special case of BP. Thus at this end, one may conclude that SP is not BP for general 
CSPs. 

Now it remains to investigate, for general CSPs, whether the state-decoupling condition is 
sufficient for PTP or weighted PTP to reduce from BP, or equivalently whether and when PTP 
and weighted PTP are SDBP 
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D. The Reduction of Weighted FTP from SDBP for General CSPs 

Up to this point, we see that the state-decoupling condition critically governs the reduction of 
BP to PTP (or weighted PTP) for /c-SAT problems and 3-COL problems. In this subsection, we 
will however show that the state-decoupling condition is not sufficient for BP (more precisely 
SDBP) to reduce to PTP and that an additional condition is needed in the general context. 

Definition 4 (Forceable Token): For any [v — c], we say that a token t„ G {x*)^^^ is forceable 
by Fc if there exists a rectangle Yl tu on V{c) \ {v} such that Fc Yl tu] = tv 

u£Vic)\{v} \u€V{c)\{v} J 

We will denote by J-'c{v) the set of all tokens on v that are forceable by Fc. Let Ac{v) ■ = 

UtGJPcC^') ^^^^^ ^ {llu€Vic)\{v} (X*)^"^)' it follows that Adv) is always forceable. In 

fact, it is easy to see that Ac{v) is the "largest" forceable token on f by F^ — in the sense of 
containing all other forceable tokens as its subsets — due to the monotonicity of Fc(-). 

In A;-SAT problems, for any (v — c), it is easy to see that J-'dv) = {*,L}, and Ac{v) = *. 
In 3-COL problems, for any (v — c), it is easy to see that J-'dv) = {123,12,23,13}, and 
Ac{v) = 123. 

For any {c — v), let A^c{v) be defined by 

-^-civ) := pl Ab{v). 

b€Civ)\{c} 

Definition 5 (Locally Compatible Constraint): A constraint Fc is said to be locally compatible 

if for any v e V{c), any forceable token G Tc{v), any rectangle t' G F^^ (t^,) on V{c) \ {v} 
(where F^^ (t^) is the set of all rectangles yv{c)\{v} on V{c) \ {v} such that Fc(yy(c)\{,;}) = 
and any u G V{c) \ {v}, it holds that 

A^ciu) C Fc (t„ X t;y(c)\{„,„}) . 

We note that the local compatibility of a constraint Fc as defined above is not simply a property 
of Fc itself. It also relies on the structure of all constraints that are distance-2 away from Fc in 
the factor graph. 

Theorem 5: Let the set of obedience conditionals {uj^, : v ^V} be given, where each v &V 
corresponds to a coordinate of a CSP Let both the MRF of the CSP (that specified via ([691), (TTOI) 
and dZT])) and the weighted PTP for the CSP be both parametrized by {uj^ : v E V}. Then if 
every constraint of the CSP is locally compatible, the SDBP derived from the MRF is equivalent 
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to the weighted PTP, where the correspondence is 

norm{PTP) ^ ^*(SDBP) 
Pc^v ^ Pc^v 

Conversely, if such an equivalence holds for every choice of {uy : v G V}, then every constraint 
of the CSP must be locally compatible. 

Alternatively phrased, Theorem [5] suggests that if the state-decoupling condition is satisfied 
in every iteration of BP, the local compatibility condition on all constraints is the necessary 
and sufficient condition for weighted PTP to reduce from BP — We note that Theorem [5] only 
refers to the equivalence of right messages. It is however straight-forward to verify (as seen in 
earlier proofs of equivalent results in this paper) that right equivalence implies the summary 
equivalence. 

This theorem answers the question when SP is SDBP in a general setting. 
Proof: 

Following the message-update rule of SDBP, 

P*c^^^^\Sv,c) ^ ( [''^v,c = '^c{sy(^c)\{v},c)] Y\. ^u^f^"* {''^u,c^^c {sv{c)\{u},c)) 

uev{c)\{v} 



''V(c)\{v},c 



I 



E 



[sjc - Fc (sy(c)\M,c)] n X] 



V(c)\{l'},c \ ^ ' ^ It, 



u,C{u)\{c) 



Pi ■^u,b ) ^ f c {Sy(c)\{u,v},c X S^c) 
.h&C(u)\{c} 



■ n pI'thsi,)] (140) 

6eC(«)\{c} / 

Similarly following the message-update rule of weighted PTP, we have 



E 

W{c)\{v}^c \ U&V{c)\{v} tc(u)\(c)- 



I n • n pTZu'^''^''\h^u) .(141) 

beC{u)\{c} ) \b&C{u)\{c} ) ) 

Identifying every right state s^^ in (11401) with token tc-»ii in (11411) and every left state s^^ 
(11401) with token ti,^c in (|14H) . the only difference between (|140l) and (I141|) is the argument 



65 



of function c<j„. (We note that since both p*^^^^^ and p"^™*^^^^-* are normalized, the scaling 
constant in (11401) and (11411) are necessarily the same.) We now prove the sufficiency and necessity 
of the local compatibility condition for the equivalence between pc^™*^^^^-* and pc^?^^"* via the 
following chain of two-way implications. 



^ 



Pi ^u,b ^ f c {^V{c)\{u,v},c ^ ^v,c) 



ybeC{u)\{c} 



n 



beC{u)\{c} 
L 



Vt; e ^(c) and every (s^^, s(;(,)\|^} J in the support of [sj^ = , 
Wu G y(c) \ {v} and every choice of \C{u) \ {c}\ tokens on {u}, {s^^, : 6 G C{u) \ {c}} 



with each sf^ in the support of p[^^^. 

,beC{u)\{c} J bGC{u)\{c} 



\/v e V{c) and every (s^^, such that G J'c{v) and G Fj,"(^s^ ,^y, 

Vn G y(c) \ {v} and every choice of \C{u) \ {c}\ tokens on {u}, {s^f, : b G C{u) \ {c}} 
with each s^f, G J^hiu). 

Pi — f'c (■5y(c)\{«,«},c X "5^c) 

beC(n)\{c} 

V^; G V^(c) and every 4(c)\{^},c) such that s^^ G J'cl^^) and 4(c)\{^,},c ^ F;^(sJJ, 

Vn G y(c) \ {v} and every choice of \C{u) \ {c}| tokens on {n}, {s^j, : 6 G C{u) \ {c}} 
with each s^^^e J^biu). 

bec(«)\{c} 



Vt; G V(c) and every (s^^, such that s^;, G J?^c('y) and s 

and every u G ^(c) \ {v}. 



V{c)\{v},c ^ fc "^(-^^c); 



66 



^ Ar^du) C Fc {sy(c)\{u,v},c ^ ^v,c) y 

Wv e V{c) and every (sj,, 4(c)\{^},c) such that s^'^ G and 4{c)\{^;},c ^ F;^(sJJ, 

and every u E V{c) \ {v}. 
■v^ Constraint Fc is locally compatible. 

Thus 

^norm(PTP) ^ ^*(SDBP)^ ^^^^y ^ 

Every constraint Fc is locally compatible. 

■ 

Now it is easy to verify that for both fc-SAT and 3-COL problems, the fact that PTP or 
weighted PTP can be reduced from BP with state-decoupling condition imposed is due to the 
fact that every constraint is locally compatible. 

For fc-SAT problems, as noted earlier, J-'c{v) = {L, *}. If we pick t„ to be either token from 
Tc{v), then for any t' G F^^(t^) and any u G V{c)\{v}, it can be verified that (^iV(c)\{M v} ^ '^■^ 
*. This makes Ar^c{u) C Fc {t'-v{c)\{uv} ^ always satisfied, independent of the factor graph 
structure of the problem instance. 

For 3-COL problems, as noted earlier, we see !Fc{v) = {123, 12, 23, 13}. Suppose that u is 
the only other coordinate (except v) that is involved in constraint Fc. If we pick t^, to be any 
token from J-'c{v), then F" (t^) = 123. This again makes A^c{u) C F" always satisfied, 
independent of the factor graph structure of the problem instance. 

That is, in both A;-SAT and 3-COL problems, the structure of each local constraint alone 
guarantees the local compatibility condition satisfied by every constraint, irrespective of how a 
constraint interacts with other constraints (that are distant 2 apart) as is generally required in the 
local compatibility condition. We generalize this fact in the following corollary — immediately 
following Theorem[5] — which provides a sufficient condition for SDBP to reduce to PTP without 
relying on the interaction of neighboring constraints. For CSPs constructed with generic local 
constraint by random factor graph structure, the corollary may turn out to be useful. 

Corollary 1: Let both the MRF of the CSP (specified via ([691), dVO]) and ^T^) and the weighted 
PTP for the CSP be parametrized by the same {a;„ : v E V}. Suppose that every constraint Fc 
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Xy 1 jj X If, 



a O a 



Fig. 5. A portion of a factor grapfi G. 



is such that for any v G V{c), any forceable token G J^c{v), any rectangle t' G ^ (t^) on 
V{c) \ {v}, and any u G \^(c) \ {v}, it holds that 

Then SDBP derived from the MRF is equivalent to weighted PTP, where the correspondence 

is 

norm(PTP) ^ *(SDBP) 
rc — >v rc — >v 

For completeness, we conclude this section by constructing an example of CSP in which the 
local compatibility condition is not satisfied by every constraint. 

Suppose that and are two of the constraints defining a CSP, and the factor graph rep- 
resenting the CSP locally obeys the structure shown in Figure |5l Suppose that each variable of the 
CSP has alphabet x = {0, 1, 2} and that F^ is defined as F^ := {(0„, 0„), (0^,, 1„), (U, 2„), (2„, 2„)}. 
Suppose that F^ is defined as F;, := {(0^, 0^,), (1„, 1^), (2u, l^)}- Note that J^dv) = {0^, 12„, 012.u}, 
and it is easy to verify that Ar^du) = Ab{u) = Ft, (012^^,) = 012^- Now if we pick = 0^, then 
we have A^du) % ^dtd) = 01^. Thus constraint Fc is not locally compatible, and following 
Theorem [5l PTP or weighted PTP can not be reduced from SDBP for this CSP 

With this example, we see that it is not always the case that SDBP is SP 

VIL Concluding Remarks 

In this paper, we study the question whether SP algorithms (non-weighted and weighted) are 
special cases of BP for general constraint satisfaction problems. 

The first contribution of this paper is a simple formulation of SP algorithms for general CSPs 
as the weighted PTP algorithm. An advantage of this formulation is that it has a probabilistically 
interpretable update rule which allows SP algorithms to be developed for arbitrary CSPs. 

The second and main contribution of this paper is the answer to the titular question in the 
most general context. We show that in general, SP algorithms can not be reduced from the BP 
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algorithm derived from the MRF formalism in the style of [fTSl and IfTTl . Such a reduction is 
only possible for certain special cases where the notions of state-decoupling condition and local 
compatibility condition are both satisfied. 

It is worth noting that our answer to whether SP is BP is only restricted to the MRF formalism 
in the style of [[T5l or ifTTl . Although this restriction is not completely satisfactory, it appears 
to us that such an MRF formalism is the most natural in light of the natural correspondence 
between the states in the MRF and the SP messages (namely that left states correspond to the 
"intentions" of variables and right states correspond to the "commands" of the constraints). An 
additional and perhaps even stronger justification of this MRF is its combinatorial descriptive 
power as is elaborated in [fTSl for A;-SAT problems, which — using the terminology of this 
paper — captures the connectivity of the solution in the space of all "rectangles". In fact, we 
conjecture that further investigation of this perspective may provide useful insights into the 
algorithm design for solving hard instances of CSPs, whether or not SP or BP is considered as 
the choice of algorithms. @ 

Further we note that the BP algorithm has been understood as a special case of Generalized 
Belief Propagation (GBP) [|20ll . In that perspective, BP may be derived from iterative minimiza- 
tion of the Bethe-approximation of the notion of free energy |[20|. The framework of GBP allows 
a variety of ways (unified under the notion of "region graphs") to approximate the free energy 
whereby leading to a much richer family of BP-like algorithms. Given the results of this paper, 
one may not want to exclude the possibility that certain choice of free-energy approximation 
allows the corresponding GBP to reduce to SP algorithms for general CSPs. Research along that 
direction may still be of interest. 

As the final remark, however, the authors of this paper would like to raise a philosophical 
question, in light of the simplicity in the (weighted) PTP formulation of SP and, in contrast, the 
complexity involved in reducing BP to SP: Should we attempt to seek a complicated explanation 
for a simple algorithm? Does the simplicity of SP (understood in terms of weighted PTP) imply 
a more natural, simpler but very different underlying graphical model — beyond MRF — that 
may better explain SP? 

^In 1151 . under the MRF formalism, Gibbs sampling-based approach has also been presented as an algorithm for solving 
random fc-SAT problems. 
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Appendix 

We now present some results concerning the dynamics of SP, based on the formulation of 
PTP and weighted PTP These results, although rather elementary, should help provide intuitions 
regarding what PTP is doing in solving a CSP We will start with the deterministic precursor of 
PTP, DTP 

A. On the Dynamics of DTP 

We will refer to a subgraph H of factor graph G as a factor- sub graph of G if for every 
constraint vertex Fc in H, all neighboring variable vertices of Vc in G are also in H . It is 
apparent that factor-subgraph i/ is a factor graph representing a CSP involving precisely a 
subset of the constraints in G. We will denote by G[H] the index set of all constraint vertices 
in H, by V[H] the index set of all variable vertices in H, and by Th the set of all assignments 
on V[H] that satisfy every constraint Fc, c & G[H]. 

If factor- subgraph i/ is a tree, it is also referred to as di factor tree of G. For any factor tree T 
of G, we will denote by L[T] the index set of all leaf vertices of T. Since we have assumed that 
factor graph G contains no degree- 1 constraint vertices, it is necessary that the leaf vertices of 
any factor tree T of G are all variable vertices, i.e., that L[r] contains no index of any constraint 
vertex. 

Suppose that T is a factor tree of factor graph G, U C V\T], and v G V\T]\U. For any 
rectangle tu on U, define 

It is easy to see that function F^^^'(-) reduces to F^'(-) introduced earlier, when T contains a 
single factor and U is V{c) \ {v}. 

Given a factor tree T of G and two vertices in T indexed by a and h respectively, we will 

T 

introduce another notation of message index, a — > b, which indexes the message sent by the 
vertex with index a along its only edge that is on the path (in T) leading to the vertex with 
index b. For example, suppose that in factor tree T, constraint vertex has a neighbor of x„ 
and is on the path from Xu to in T, then message index u — > v is equivalent to u — > c, and 
t T is equivalent to t^^c- 

U -rV 
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A factor tree T of G will be referred to as a (f , /)-tree of G if the variable vertex is in T, 
every leaf vertex in T is distance 21 from vertex x^, and all vertices in G that have distance to 
Xt, no larger than 21 are contained in T. It is clear that given G, v eV and a positive integer /, 
if a [y, /)-tree of G exists, it is unique. We therefore denote it by T^. 

Given of factor graph G, factor tree T^_^ of G is the subgraph of induced by vertex x^ 
and all vertices of whose paths to x^ (in T^) traverse through vertex Fc- On the other hand, 
factor tree T^^^ is the subgraph of induced by vertex x^ and all vertices of whose paths 
to Xy (in Tl) do not traverse through vertex T^. 

In what follows, we will use superscript (/) on a message to refer to the message in the 1^^ 
iteration. 

Proposition 2: Suppose that / > 1 and that factor tree of factor graph G exists. Then in 
iteration / of DTP, 



As the inductive hypothesis, suppose that the result of this proposition holds for a given 
iteration number / > 1. This implies specifically that for every u G V{c) \ {v} and every 



Proof: We will prove this result by induction on /. 




For the base case, we have 




h e G{u) \ {c}. 
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Then 



bGC(«)\{c} 



n <rr'^" n ^% 



bec(u)\{c} V^'G^K-J 



/ rr ,(1) 



Finally, 



,«6y(c)\M 



I- 1 n n 



Uy^C \ U-/-C 



^u&V{c)\{v} \weL[Tl,J w—^u 



This completes the proof. ■ 
Translating this results to summary tokens, the following result can be obtained immediately. 
Corollary 2: Suppose that / > 1 and that factor tree of factor graph G exists. Then in 
iteration / of DTP, 



The implication of this result is that on factor graph with sufficiently large girth, DTP is in 
fact very well-behaved: the summary token at any variable x„ in iteration / depends precisely 
on the initial tokens passed by variables that are 21 away from x^. Specifically, one may view 
those tokens form a rectangle on L[T.lj], and the summary token at Xy in iteration / is precisely 
the set of all assignments on {v} that can make F^^ satisfied, given the assignment on L[T^] is 
from that rectangle. 

Now we develop some results of DTP that require no "local cycle-freeness" in the factor 
graph. 
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Lemma 15: At every v & V and for any /, 

cec(v) 

Proof: Suppose that G tv\ Then G tcLv for every c G C(t'), by the definition of 
summary messages. It follows that x^ G tv^]} for every c G C(t'). Then G f^^g^j^^,-, 
This shows that tl'^ C HcecCti) '^t'^c^ 

On the other hand, suppose that x^ G f|cec{»;) ^^'-^^^ Then x„ G ti'-ic^ = nbec(i;)\{c} ^'-li;' 
for every c G C(f). It follows that x.^ G ^['2,^, for every b G C(w), giving rise to that Xy G 

n f(') - f(') Thus n f^'+^) c f(') 

Therefore tl'^ = flcecCi;) ^^'-^c''. ■ 
Lemma 16: Suppose that xy is a satisfying assignment on V, namely that xy satisfies O. If 

^ n n '^"^c in some iteration /, then G H ^^^^ ■ 
vev cec{v) vev 

Proof: The fact that xy G H fl ^^^-^c implies that for every v and c G C(f ), xv:{v} ^ 

vev cec{v) 

f] tvLc ^ tvLc, and hence via the "monotonicity" of function Fc, Fc {{xv:Vic)\{v}}) ^ 
tu^c I = tc-Ijj. Incorporating that xy is a satisfying assignment, we see that 

^u(^Vic)\{v} J 

xv:{v} e Fc ({5'y;y(c)\{i,}}) ^ ^cii,, for every f G ^ and c G Thus xv:{v} e H ^c-^f = 

c€C{v) 

tl!\ It then follows that xy e U t^v\ ■ 
Proposition 3: Suppose that xy is a satisfying assignment and that the initialization of DTP 
is such that Xy.j^,} G ti^c for every i; G V and c G C(t'). Then in any iteration /, the rectangle 
Yl tv^ formed by the summary tokens contains xy. 

Proof: At iteration 1, the fact that G t^vXc for every v ^ V and c G C(f) implies that 

e n^ey Clceciv) ^^^-^c Followed by Lemma [M we have xy G HiiGy 
As the inductive hypothesis, suppose we have xy G Ylvev iteration /. At iteration 

/ + 1, followed by Lemma [TSl we have xy G Yl^^y Clceciv) tv^l^ ■ Then by Lemma [T6l xy G 

Therefore, this proposition is proved by induction. ■ 
At this end, we have shown that if DTP is initialized to "containing" a satisfying assignment, 

then this assignment is contained in the rectangle formed by the summary tokens in all iterations. 

That is, the solution of the CSP will never get "lost" during DTP iteration provided that it is 
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contained in the initial rectangle. This result (Proposition [3]) and Corollary [2] presented earlier 
will become useful when we discuss the dynamics of PTP. 

B. On the Dynamics of PTP and Weighted PTP 
We now turn our attention to (non- weighted) PTP. 

Denote by the factor- subgraph of G which contains all factors whose messages have 
propagated to variable Xy by the end of PTP iteration /. That is, G^ is the factor- subgraph of G 
that contains variable vertex Xi, and all vertices whose distances to x^ are no larger than 21. It 
is apparent that if Gl is a tree, then it is the (v, I) factor tree T^. 

Let /* be the smallest / such that at least for one v E V, does not exist. Denote m„(/) : = 
(rG^).|,y| • That is, my{l) is the number of assignments of variable x^ that can make all 
constraints in G[, satisfied. Clearly, mi,{l) is a non-increasing function of /. 

We will first restrict the CSP to a "single-solution CSP", i.e., having exactly one satisfying 
assignment. We will denote this assignment on V by xy- 

Let / be the smallest / for which minm„(/) = L It is worth noting that such / exists since 
the CSP has precisely one solution. Let v satisfy 711^,(1) = 1. 

Proposition 4: Let factor graph G represent a single-solution CSP. Suppose that the initial- 
ization of PTP is such that every left message Aillc(i) is strictly positive for every t E (x*)^*^^- 
If / < /*, then 

/^r"^ ^Ht) = [t = {xv..{,y}]. 

Proof: This result relies on Corollary |2l 
First, / < /* implies that (v, I) factor tree exists. Then by Corollary [2l if DTP is initialized 
such that the tokens sent from the leaves of t| form Yl t^^^ ' then the summary token at v 



in the iteration is F Ff r~ ' . 

,«GL[T|] u-^i. 



Since v satisfies mj)(/) = 1, it is necessary that f^[^*'^^' j Yl I either token 



or 05 which depends on the rectangle initialized. 
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Now PTP on T^, with respect to Xf,, may be understood as initializing a random rectangle 
on L[Tl-] (the distribution of which is characterized by the product of the initial messages), 
transforming the random rectangle to random token on v via a functional mapping -p^^^^^"" {■), 
and conditioning on the resulting token being valid (non-empty set). The fact that initial messages 
of PTP are strictly positive assures that every rectangle on L[T^] has non-zero probability during 
initialization. After conditioning on the resulting token being valid, the token is removed from 
the allowed realization of the resulting token and thus the resulting token equals with 
probability 1. This completes the proof. ■ 

This result and its proof can be easily extended to a somewhat larger family of CSPs each 
containing multiple solutions, as shown in the next proposition. 

Proposition 5: Suppose that in the CSP, there exist a coordinate v E V and an assignment 
Xy G (x*)^^^ such that every satisfying configuration xy G F satisfies Xv:{v} = Xy. If for some 
integer /, exists and m{,(l) = 1, then 

/^r™ '\t) = [t = {X,}]. 

The proof is similar to that for proposition IH which essentially relies on Corollary [2] and that 
the local tree rooted at v is large enough. Skipping the proof, we note that Proposition |4] may 
be viewed as a special case of Proposition [5l 

Based on the results above, we provide some remarks concerning the dynamics of PTP and 
argue intuitively how it solves a CSP. 

1) Similar to what was argued in the proof of Proposition IH the key insight regarding 
what PTP is doing is that PTP updates a random rectangle whose sides are distributed 
independently. 

At the initialization stage, PTP defines a random rectangle on V, where the sides of the 
random rectangles are treated as independent random variables. In every iteration, PTP 
maps this random rectangle to a new random rectangle in the following steps. 

a) Apply a functional mapping defined by the right-message update rule and the left- 
message update rule. 

b) Eliminate the resulting empty rectangles (via conditioning on that each side of the 
resulting random rectangle is not the empty set and re-normalization). 
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c) Take the marginal distribution of the resulting random rectangle on each side variable, 
and treat all sides as being independent random variables. This defines a new random 
rectangle. 

PTP iterates over these steps to continuously update the random rectangle. 

2) For single-solution CSPs, based on Proposition |4l if the girth of the graph is large enough, 
at least one side of the new rectangle, after some iterations, becomes deterministic, namely 
the singleton set containing the correct assignment for that variable. This would allow the 
decimation procedure to fix this variable to the correct assignment and reduce the problem. 
Similar results hold for CSPs having more than one solutions but in which all solutions 
share a single assignment on some coordinate. By Proposition [51 in this case, when the 
local tree rooted at that variable is sufficiently large, PTP will find that variable and its 
correct assignment. Of course, the condition of Proposition |4] and that of Proposition [51 
namely that there is a sufficiently large local tree rooted at a variable and that the variable 
only has one correct assignment, may not hold in reality. As a consequence, no side of the 
random rectangle is deterministically a singleton. In that case, the decimation procedure 
must deal with this ambiguity — resulted from non-ideal factor graph structure and the 
complexity of the solution space — and make a good guess to fix a variable. 

3) Proposition [H and Proposition [2l also suggest that when the graph has large girth (and 
when the solutions share one common assignment on some coordinate), as PTP iterates, 
the rectangles containing no solutions will be gradually removed from the sample space 
of the random rectangle. 

4) Proposition [3l implies that regardless of cycle structure of the graph, all solution-containing 
rectangles will be kept (possibly in a form of combining each other) over PTP iterations. 

5) Combining [3]) and [J) above, one may view each PTP iteration as performing a "filtering" 
operation on the distribution of the random rectangle. As the distribution of the random 
rectangle evolves, the probability mass moves gradually to one biased to some solution- 
containing rectangles. When the graph has large girth and some coordinate is in a "favor- 
able" position (in a sense combining its location in the graph and its role in the solution 
space), the summary message at this coordinate may become more deterministically biased 
to a singleton token, making decimation possible. 
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Finally, we briefly remark on weighted PTP. 

Similar to PTP, weighted PTP also updates a random rectangle. However, instead of using 
a functional mapping, in step a) of the above procedure, it uses a conditional distribution. 
By examining the form of the obedience conditionals, it is intuitive that comparing with PTP, 
weighted PTP shifts the distribution of each side of the random rectangle more towards "smaller" 
tokens on each coordinate. (Here ty is said to be smaller than t[, if C t'^.) This provides the 
algorithm better opportunity to lead to some side of the random rectangle more deterministically 
biased to a singleton. 
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